How to (pre)filter data used in a visualization?

AxelR · February 7, 2020, 10:51am

Hello,

I'm trying to build an histogram which must count only the first occurrences (chronologically speaking) of all recorded events for the corresponding period (in my data, a specific event can occur several times with a different outcome each time). From what I gathered so far, this might be done by using data aggregations.

However, I'm having trouble finding examples in Kibana on how to give an aggregation as in input to filter the elements being counted...

I'm not sure that I have been clear enough, do not hesitate to ask for further info.

Thanks in advance everybody

flash1293 · February 12, 2020, 12:52pm

Hi @AxelR,

I'm not aware of a way to do this during query time - you have to make sure your data is already indexed in a "de-duped" way.

One way to do this is to set up a transform job as described here: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/put-transform.html
This will continuously run aggregations on your data and store the pre-aggregated results so you can build visualizations (like a histogram) on top of it

Let's say you detect an event being duplicated by its event_id field. By grouping by event_id and adding the min of your timestamp field to the result, the result index will only contain one document per event id (with the first occurence as its timestamp)

PUT _transform/first_event_transform
{
  "source": {
    "index": "all_events",
  },
  "pivot": {
    "group_by": {
      "event": {
        "terms": {
          "field": "event_id"
        }
      }
    },
    "aggregations": {
      "first_occurrence": {
        "min": {
          "field": "timestamp"
        }
      }
    }
  },
  // ...
}

Based on this index you can create your histogram aggregation as usual.

AxelR · February 12, 2020, 3:13pm

Thanks, I think I understand the general idea!
Since the transform job creates a new index, I guess I also have to store the outcome (correct/incorrect) of the first test, do I?

flash1293 · February 12, 2020, 4:26pm

If you want to visualize it, yes - it's kind of similar to a sql query grouping by the event id - all fields you want to access to have to define together with the aggregation (because there could be multiple documents within each group).

For fetching the outcome of the first event, you probably have to resort to a scripted metric: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-metrics-scripted-metric-aggregation.html

If you are just after all fields of the first document, a solution using a logstash pipeline is probably a better fit: https://www.elastic.co/blog/how-to-find-and-remove-duplicate-documents-in-elasticsearch

You need an additional service (Logstash) to process the data, but it's more straight forward for this kind of thing. Transforms are better suited if you just want to access aggregations of the groups.

system · March 11, 2020, 4:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Apply a filter to a count on the y-axis of a bar vizualization Kibana	2	518	December 22, 2020
Create a histogram of filtered data Kibana	4	896	May 11, 2017
How do I count and visualize only latest doc based on certain field? Kibana dashboard , data-views , visualisation	3	306	October 18, 2023
Visualizing time series counting first time a term appears in index Kibana	8	1113	December 26, 2018
How to visualize a customized aggregation in Kibana4? Kibana	2	1562	July 6, 2017

How to (pre)filter data used in a visualization?

Related topics