How to create an histogram of string occurences over all log entries?

s7ygian · August 27, 2019, 2:09pm

Hi everyone,

I have the following 3 exemplary and very simple log entries:

textA textB textC; textA textB textC

textA textB; textA textB

textA; textA

Now I want to create a histogram in Kibana with the following exemplary result:

textA: 6
textB: 4
textC: 2

How can I achieve that within Logstash and/or Kibana?

I thought about extracting the individual text counts into individual fields (e.g. nTextA, nTextB, nTextC) per log entry. But how can I create the histogram mentioned above out of these fields over all log entries in the next step?

Note: The above log entries are only examples. I'm looking for a solution for about 100K different texts, all of which can occur once or several times in one or multiple log entries.

Best regards, s7ygian

mattkime · August 27, 2019, 3:27pm

I'd recommend using a Logstash filter such as Grok so that each document in your index has individual fields summarizing the terms. https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html Then you should be able to create a bar chart with a Sum aggregation. https://www.elastic.co/guide/en/kibana/current/xy-chart.html

s7ygian · August 29, 2019, 7:10am

I think the proposed solution only works for a few terms (see example above). But I am interested in a generic solution with a lot of terms (I don't even know how many terms there are in total). As a result, potentially many fields (nTextA, nTextB, nTextC, ..., nTextX-1, nTextX) per log entry have to be created in the logstash. So far, so good.

But how can a histogram be created from the many fields (nTextA, nTextB, nTextC, ..., nTextX-1, nTextX) in Kibana, which sums up all fields of the same name over all log entries? In the proposed sum aggregation you need to select only one field. You cannot select many fields by regex like "nText*".

mattkime · August 30, 2019, 2:53pm

@s7ygian yes, you make a good point. Perhaps a better way would be to structure a new index for the particular query that you need. You could create a document for each term using logstash split - https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html - which would make querying and counting with kibana very simple.

Please note that I'm linking to docs for the latest release and you might be running a different version.

system · September 27, 2019, 2:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.