SkaETL

Introduction

SkaETL” is the 100% Java ETL developed entirely by SkaLogs. It is an innovative real-time ETL which allows the user to perform data (Logs) transformations using a set of guided workflows. It is one of the key elements of a successful Log Management project because it is usually one of the most complicated aspects to manage.

SkaETL provides multiple Log processing features which simplify the difficult work of log analytics:

  • Workflows: Ingestion (“Consumer”), Metrics Process, Referentials
  • Ingestion Process: workflow for defining ingestion process
    • Parsing, Transformation, Normalization, Aggregation
    • Validation, Filtering, Output (ES, HDFS, Notifications)
  • Metric Process: workflow for calculation of Indicators and Metrics
    • Uses Compute Templates (“SkaLogs Templates” ) for Standard calculations,
    • Uses SkaLang, the SkaLogs proprietary Language (“SkaLogs Language”) for complex calculations
  • Grok Parsing simulation
  • Referentials: creating data referentials (repositories) for later reuse
  • Preconfigured CMDB repository (directory)
  • Event-Based Alerts and Notifications (Incidents, Thresholds): uses the Alert Module (“SkaLogs Alerts”)
  • Storing processed Logs in Indexes in ElasticSearch, HDFS

Characteristics

No other Log Management solution  (Open Source or Proprietary) provides an ETL like SkaLogs, whose characteristics are as follows :

  • REAL TIME: real-time streaming, transformation, analysis, standardization, calculations and visualization of all your ingested and processed data
  • GUIDED WORKFLOWS:
    • Consumer processes: (data ingestion pipelines) to guide you through ingestion, parsing, transformation, filtering, validation, normalization
      • avoid the tedious task of transforming different types of Logs via Logstash
    • Optional metrics computations via simple functions or complex customized functions via SkaLogs Language
    • Optional alerts and notifications
    • Referentials creation for further reuse
  • LOGSTASH CONFIGURATION GENERATOR:  on the fly Logstash configuration generator
  • PARSING: grok, nitro, cef, with simulation tool
  • ERROR RETRY MECHANISM: automated mechanism for re-processing data ingestion errors
  • REFERENTIALS: create referentials for further reuse
  • CMDB: create IT inventory referential
  • COMPUTATIONS (METRICS): precompute all your metrics before storing your results in ES (reduces the use of ES resources and the # ES licenses),
  • SKALOGS LANGUAGE: Complex queries, event correlations (SIEM) and calculations, with an easy-to-use SQL-like language
  • MONITORING – ALERTS: Real-time monitoring, alerts and notifications based on events and thresholds
  • VISUALIZATION: dashboard to monitor in real-time all your ingestion processes, metrics, referentials, kafka live stream

Features

TYPE DESCRIPTION
SkaETL Data Conversion Tool (Logs), Metric Calculation. Easily manage Variety and Log Variability. Adapt to any Industry and Business
Log Ingestion Pipelines Create Log ingestion pipelines via guided workflow
Creating Log Repositories Create data repositories from logs or external systems via guided workflow
Error Retry Manage ingestion errors via an automatic retry mechanism
Real time Visualize in real time the transformed data flow (Logs) (Kafka Stream)
Metric Calculation (excluding ES / Kibana) Create simple or complicated (Non-ES / Kibana) metrics (using SkaLang proprietary Language). Reduces the impact on ES dedicated resources, thus reducing the number of ES Instances required
Alerts (thresholds) Sending alerts when a user’s predefined threshold is exceeded
Alerts (anomaly detection) Sending alerts in case of anomaly detection
Alerts (Machine Learning) Discovery of known and unknown anomalies by Hybrid ML Methods (Supervised and 𐄂-Supervised / DNN)
Storage of Logged Data Multiple and miscellaneous Data Silos: Allows to specify the output of the transformed Data in various Storage Infrastructure (ElasticSearch, Kafka, SysOut, Cassandra, MongoDb …)

Guided WorkFlows

TYPE DESCRIPTION
Stream and Buffer Log ingestion in the Kafka buffer
Log Processing Transformation, Parsing, Normalisation
Log Ingestion Create log ingestion pipelines
Error management Resubmit ingestion errors via an automatic management
mechanism
Referentials (repositories) Create data repositories from Logs or external systems
Metrics Calculate Metrics by simply defining them (thanks to templates) or by using a proprietary language that makes it easy to define a complex metric
Monitoring and Alerts Perform monitoring, notifications and alerts
Visualisation Visualize in real time the flow of transformed data (Kafka Stream)
Log output Send transformed and metric data calculated in ElasticSearch (or other)

Once the data and metrics calculated in the ETL, they can be sent in ElasticSearch (or other storage technology – NoSQL, SQL, Data Lake).

Comparison