Table of Contents

Introduction

“SkaETL” is the 100% Java ETL developed entirely by SkaLogs. It is an innovative real-time ETL which allows the user to perform data (Logs) transformations using a set of guided workflows. It is one of the key elements of a successful Log Management project because it is usually one of the most complicated aspects to manage.

SkaETL provides multiple Log processing features which simplify the difficult work of log analytics:

Workflows: Ingestion (“Consumer”), Metrics Process, Referentials
Ingestion Process: workflow for defining ingestion process
- Parsing, Transformation, Normalization, Aggregation
- Validation, Filtering, Output (ES, HDFS, Notifications)
Metric Process: workflow for calculation of Indicators and Metrics
- Uses Compute Templates (“SkaLogs Templates” ) for Standard calculations,
- Uses SkaLang, the SkaLogs proprietary Language (“SkaLogs Language”) for complex calculations
Grok Parsing simulation
Referentials: creating data referentials (repositories) for later reuse
Preconfigured CMDB repository (directory)
Event-Based Alerts and Notifications (Incidents, Thresholds): uses the Alert Module (“SkaLogs Alerts”)
Storing processed Logs in Indexes in ElasticSearch, HDFS

Characteristics

No other Log Management solution (Open Source or Proprietary) provides an ETL like SkaLogs, whose characteristics are as follows :

REAL TIME: real-time streaming, transformation, analysis, standardization, calculations and visualization of all your ingested and processed data
GUIDED WORKFLOWS:
- Consumer processes: (data ingestion pipelines) to guide you through ingestion, parsing, transformation, filtering, validation, normalization
  - avoid the tedious task of transforming different types of Logs via Logstash
- Optional metrics computations via simple functions or complex customized functions via SkaLogs Language
- Optional alerts and notifications
- Referentials creation for further reuse
LOGSTASH CONFIGURATION GENERATOR: on the fly Logstash configuration generator
PARSING: grok, nitro, cef, with simulation tool
ERROR RETRY MECHANISM: automated mechanism for re-processing data ingestion errors
REFERENTIALS: create referentials for further reuse
CMDB: create IT inventory referential
COMPUTATIONS (METRICS): precompute all your metrics before storing your results in ES (reduces the use of ES resources and the # ES licenses),
SKALOGS LANGUAGE: Complex queries, event correlations (SIEM) and calculations, with an easy-to-use SQL-like language
MONITORING – ALERTS: Real-time monitoring, alerts and notifications based on events and thresholds
VISUALIZATION: dashboard to monitor in real-time all your ingestion processes, metrics, referentials, kafka live stream

Features

TYPE	DESCRIPTION
SkaETL	Data Conversion Tool (Logs), Metric Calculation. Easily manage Variety and Log Variability. Adapt to any Industry and Business
Log Ingestion Pipelines	Create Log ingestion pipelines via guided workflow
Creating Log Repositories	Create data repositories from logs or external systems via guided workflow
Error Retry	Manage ingestion errors via an automatic retry mechanism
Real time	Visualize in real time the transformed data flow (Logs) (Kafka Stream)
Metric Calculation (excluding ES / Kibana)	Create simple or complicated (Non-ES / Kibana) metrics (using SkaLang proprietary Language). Reduces the impact on ES dedicated resources, thus reducing the number of ES Instances required
Alerts (thresholds)	Sending alerts when a user’s predefined threshold is exceeded
Alerts (anomaly detection)	Sending alerts in case of anomaly detection
Alerts (Machine Learning)	Discovery of known and unknown anomalies by Hybrid ML Methods (Supervised and 𐄂-Supervised / DNN)
Storage of Logged Data	Multiple and miscellaneous Data Silos: Allows to specify the output of the transformed Data in various Storage Infrastructure (ElasticSearch, Kafka, SysOut, Cassandra, MongoDb …)

Guided WorkFlows

TYPE	DESCRIPTION
Stream and Buffer	Log ingestion in the Kafka buffer
Log Processing	Transformation, Parsing, Normalisation
Log Ingestion	Create log ingestion pipelines
Error management	Resubmit ingestion errors via an automatic management mechanism
Referentials (repositories)	Create data repositories from Logs or external systems
Metrics	Calculate Metrics by simply defining them (thanks to templates) or by using a proprietary language that makes it easy to define a complex metric
Monitoring and Alerts	Perform monitoring, notifications and alerts
Visualisation	Visualize in real time the flow of transformed data (Kafka Stream)
Log output	Send transformed and metric data calculated in ElasticSearch (or other)

Once the data and metrics calculated in the ETL, they can be sent in ElasticSearch (or other storage technology – NoSQL, SQL, Data Lake).

Introduction

Characteristics

Features

Guided WorkFlows

Comparison