How to create an histogram of substring occurences?

s7ygian · August 6, 2019, 6:58am

Hi,

I have the following 3 exemplary JSON log entries:

{
  hint_lists: [{
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!", "Entity Z with number 3 unkown!"]
    }, {
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!", "Entity Z with number 3 unkown!"]
    }
  ]
}

{
  hint_lists: [{
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!"]
    }, {
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!"]
    }
  ]
}

{
  hint_lists: [{
      "hint_list": ["Entity X with number 1 is unknown!"]
    }, {
      "hint_list": ["Entity X with number 1 is unknown!"]
    }
  ]
}

Now I want to create a histogram (unknown entity number -> count) in Kibana with the following result:

1: 6
2: 4
3: 2

How can I achieve that within Logstash and/or Kibana?

Best regards, s7ygian

christophilus · August 6, 2019, 4:26pm

I'm not entirely sure I follow, but if you're trying to extract the numeric value from your text fields, then run a histogram on that, it's going to be tricky. The easiest thing is to write the numbers as separate fields when the data is written. If you can't do that, you may be able to do something with scripted fields to do the extraction and then crunch some stats on that, but I'm not sure.

s7ygian · August 7, 2019, 6:29am

I also thought about extracting the count of every unknown entity into individual fields (e.g. unknown_1, unknown_2, unknown_3) per log entry. But how can I create the histogram mentioned above in Kibana out of these fields in the next step?

Note: The above log entries are only examples. There are about 100K entities with different entity numbers, all of which may be unknown once or several times in one or multiple log entries.

christophilus · August 7, 2019, 3:06pm

Maybe I'm misunderstanding what you're trying to do, but it sounds like a terms aggregation should get you the data you want.

s7ygian · August 21, 2019, 1:15pm

Could you explain that in more detail, please?

system · September 18, 2019, 1:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to create an histogram of string occurences over all log entries? Kibana	4	560	September 27, 2019
Kibana words histogram Kibana	3	2670	July 6, 2017
Visualize log message specific content frequency Kibana	4	1232	March 27, 2019
Histogram (bars) with counts as bins Kibana	9	7307	October 23, 2018
Histogram from Terms count Kibana	4	1400	December 30, 2017

How to create an histogram of substring occurences?

Related topics