[Best_Wordpress_Gallery id=”5″ gal_title=”SkaETL”]
Table of Contents
Introduction
“SkaETL” is the 100% Java ETL developed entirely by SkaLogs. It is an innovative real-time ETL which allows the user to perform data (Logs) transformations using a set of guided workflows. It is one of the key elements of a successful Log Management project because it is usually one of the most complicated aspects to manage.
SkaETL provides multiple Log processing features which simplify the difficult work of log analytics:
- Workflows: Ingestion (“Consumer”), Metrics Process, Referentials
- Ingestion Process: workflow for defining ingestion process
- Parsing, Transformation, Normalization, Aggregation
- Validation, Filtering, Output (ES, HDFS, Notifications)
- Metric Process: workflow for calculation of Indicators and Metrics
- Uses Compute Templates (“SkaLogs Templates” ) for Standard calculations,
- Uses SkaLang, the SkaLogs proprietary Language (“SkaLogs Language”) for complex calculations
- Grok Parsing simulation
- Referentials: creating data referentials (repositories) for later reuse
- Preconfigured CMDB repository (directory)
- Event-Based Alerts and Notifications (Incidents, Thresholds): uses the Alert Module (“SkaLogs Alerts”)
- Storing processed Logs in Indexes in ElasticSearch, HDFS
Characteristics
No other Log Management solution (Open Source or Proprietary) provides an ETL like SkaLogs, whose characteristics are as follows :
- REAL TIME: real-time streaming, transformation, analysis, standardization, calculations and visualization of all your ingested and processed data
- GUIDED WORKFLOWS:
- Consumer processes: (data ingestion pipelines) to guide you through ingestion, parsing, transformation, filtering, validation, normalization
- avoid the tedious task of transforming different types of Logs via Logstash
- Optional metrics computations via simple functions or complex customized functions via SkaLogs Language
- Optional alerts and notifications
- Referentials creation for further reuse
- Consumer processes: (data ingestion pipelines) to guide you through ingestion, parsing, transformation, filtering, validation, normalization
- LOGSTASH CONFIGURATION GENERATOR: on the fly Logstash configuration generator
- PARSING: grok, nitro, cef, with simulation tool
- ERROR RETRY MECHANISM: automated mechanism for re-processing data ingestion errors
- REFERENTIALS: create referentials for further reuse
- CMDB: create IT inventory referential
- COMPUTATIONS (METRICS): precompute all your metrics before storing your results in ES (reduces the use of ES resources and the # ES licenses),
- SKALOGS LANGUAGE: Complex queries, event correlations (SIEM) and calculations, with an easy-to-use SQL-like language
- MONITORING – ALERTS: Real-time monitoring, alerts and notifications based on events and thresholds
- VISUALIZATION: dashboard to monitor in real-time all your ingestion processes, metrics, referentials, kafka live stream
Features
| TYPE | DESCRIPTION |
| SkaETL | Data Conversion Tool (Logs), Metric Calculation. Easily manage Variety and Log Variability. Adapt to any Industry and Business |
| Log Ingestion Pipelines | Create Log ingestion pipelines via guided workflow |
| Creating Log Repositories | Create data repositories from logs or external systems via guided workflow |
| Error Retry | Manage ingestion errors via an automatic retry mechanism |
| Real time | Visualize in real time the transformed data flow (Logs) (Kafka Stream) |
| Metric Calculation (excluding ES / Kibana) | Create simple or complicated (Non-ES / Kibana) metrics (using SkaLang proprietary Language). Reduces the impact on ES dedicated resources, thus reducing the number of ES Instances required |
| Alerts (thresholds) | Sending alerts when a user’s predefined threshold is exceeded |
| Alerts (anomaly detection) | Sending alerts in case of anomaly detection |
| Alerts (Machine Learning) | Discovery of known and unknown anomalies by Hybrid ML Methods (Supervised and 𐄂-Supervised / DNN) |
| Storage of Logged Data | Multiple and miscellaneous Data Silos: Allows to specify the output of the transformed Data in various Storage Infrastructure (ElasticSearch, Kafka, SysOut, Cassandra, MongoDb …) |
Guided WorkFlows
| TYPE | DESCRIPTION |
| Stream and Buffer | Log ingestion in the Kafka buffer |
| Log Processing | Transformation, Parsing, Normalisation |
| Log Ingestion | Create log ingestion pipelines |
| Error management | Resubmit ingestion errors via an automatic management mechanism |
| Referentials (repositories) | Create data repositories from Logs or external systems |
| Metrics | Calculate Metrics by simply defining them (thanks to templates) or by using a proprietary language that makes it easy to define a complex metric |
| Monitoring and Alerts | Perform monitoring, notifications and alerts |
| Visualisation | Visualize in real time the flow of transformed data (Kafka Stream) |
| Log output | Send transformed and metric data calculated in ElasticSearch (or other) |
Once the data and metrics calculated in the ETL, they can be sent in ElasticSearch (or other storage technology – NoSQL, SQL, Data Lake).
Comparison


