How to setup data validation functionalities in Kibana and Elasticsearch

We have installed ELK stack in our env and we use it to monitoring and log viewing. For now, we try to find new usages for ELK stack. We are using ELK in 7.8 version and we see in Kibana UI a lot of new functionalities like Machine learning stuff and others. We explore a little Data Visualizer tool from Kibana where we can just upload CSV files and explore data from there. Based on that explorations we invented a use-case for our platform because we think that can match to our process when we generate CSV files and we need to validate them. Our concept looks like below:

  1. our platform generate CSV file in daily batches ->
  2. we load that files to elastic index ->
  3. calculate/aggregate summary of that file from daily batch e.g. documents count, null values for particular fields, some mathematic operations ->
  4. based on that summary create some validate rules which should check that our data is valid ->
  5. if data isn't valid, notify the external system

At the moment we are a bit stuck on point 3. We don't know, how in a proper way generate a summary of documents from elasticsearch index. We investigate the Transform feature from Kibane when we able to aggregate/calculate data from one index and save results in the destination index, but we can apply only basic functions like sum, max, count, but we need here more complex operations like conditions, etc. Here comes my first question, how to solve such a case in ELK

For point 4 regarding validation rules, we explore the "Watcher" from Kibana and we think that we can use it to set up our validation rules. Based on our calculated summary from point 3 we can set thresholds which, when exceeded, would be performed by a specific alerting action. We are still not sure that this is the correct approach, so I would like to hear from someone if it makes sense

Transforms can also utilize custom scripted metric aggregations.

Examples: https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-painless-examples.html

Ok, I tried these scripted metric aggregations and it's looking good to prepare some operations. But observe one strange thing. I use in pivot data_histogram grouping by timestamp. If I don't add aggregation with script then my destination index is creating with timestamp as date type, but when I add aggregation with script then I get destination index with timestamp as keyword. I also can't mix aggregation with scripts and time-based aggregation? I need this timestamp filed in the new index because I will utilize the watcher feature in kibana.

Can you run your transform configurations through _preview? This will tell you exactly what transform is going to do, iff it auto-creates the destination index (note that you can create the dest index yourself before _start instead).

If the problem persists, please share some example configs so we can have a look. By design transform does not create mappings for scripted_metric aggregations, because it is not possible to guess the right mapping, a script can return anything. There should not be any interaction w.r.t. mappings between your group_by and your aggregations, if so, it would be a bug.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.