How to create an histogram of substring occurences?

Hi,

I have the following 3 exemplary JSON log entries:

{
  hint_lists: [{
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!", "Entity Z with number 3 unkown!"]
    }, {
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!", "Entity Z with number 3 unkown!"]
    }
  ]
}

{
  hint_lists: [{
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!"]
    }, {
      "hint_list": ["Entity X with number 1 is unknown!", "Entity Y with number 2 is unknown!"]
    }
  ]
}

{
  hint_lists: [{
      "hint_list": ["Entity X with number 1 is unknown!"]
    }, {
      "hint_list": ["Entity X with number 1 is unknown!"]
    }
  ]
}

Now I want to create a histogram (unknown entity number -> count) in Kibana with the following result:

1: 6
2: 4
3: 2

How can I achieve that within Logstash and/or Kibana?

Best regards, s7ygian

I'm not entirely sure I follow, but if you're trying to extract the numeric value from your text fields, then run a histogram on that, it's going to be tricky. The easiest thing is to write the numbers as separate fields when the data is written. If you can't do that, you may be able to do something with scripted fields to do the extraction and then crunch some stats on that, but I'm not sure.

I also thought about extracting the count of every unknown entity into individual fields (e.g. unknown_1, unknown_2, unknown_3) per log entry. But how can I create the histogram mentioned above in Kibana out of these fields in the next step?

Note: The above log entries are only examples. There are about 100K entities with different entity numbers, all of which may be unknown once or several times in one or multiple log entries.

Maybe I'm misunderstanding what you're trying to do, but it sounds like a terms aggregation should get you the data you want.