String Occurrences within a String


(Todd Halfast) #1

I am new to Elastic and Kibana. So far I have had very little issue in developing basic visualizations and searches. However I have come upon an issue that I have seen other people post about but cannot seem to resolve my problem.

I need to count the number of occurrences of a sub-string within a string field.

My index pattern is similar to this:

  • _id: type string (aggregatable)
  • content.content: type string
  • content.content.keyword: type string (aggregatable)

The content.content field is HTML for a webpage (yes, I know I have to escape out reserved characters when searching.) An example of what I am trying to do is count the number of times a specific iframe element exists within this webpage.

My query syntax looks like this: content.content : "iframe class=\"lls_activity_embed\"" and it works but the results only return 1 hit per object. In english, this basically shows me "the number of objects that have insert search term in the content.content field.

What I want is the number of times insert search term occurs within the content.content field. I filtered the results down to a specific ID that I know had at least 2 iframes, and the result count was 1 (again, 1 ID that contained at least one iframe versus the 2 "iframes" count I was hoping for.

Is there anyway to accomplish this via query and visualization without adding calculated fields/filters/tokens/new index patterns/etc? Or worse yet, building a C# job that leverages HTML Agility pack to parse the inner elements and return counts to a db that I could then very easily count and group via SQL


(Marius Dragomir) #2

the way Elasticsearch works doesn't make it suited for this kind of calculation. You could use a scripted field to count the number of substring occurrences, but it's not a recommended method as it will put more strain on your cluster.
Here's an example of a scripted field that somebody used to find substrings: