Aggregation on substring of a specific field

Hi. I'm trying to aggregate on part of strings of a given field.

For example, I have 5 documents in an index "Petroglyph".
Each documents includes a field called "Site".
In "Site", input values follow pattern like 'site1<site2<site3<site4',

doc1 - "Site": "Asia<South Korea<Ulsan<Bangudae"
doc2 - "Site": "Europe<France<Dordogne<Grotte de Cussac"
doc3 - "Site": "Europe<France<Dordogne<Lascaux"
doc4 - "Site": "Asia<South Korea<Ulsan<Cheon-Jeon-Ri"
doc5 - "Site": "Africa<Namibia<Kunene<Twyfelfontein"

and I wish to do aggregation on 'site2<site3' regarding its order, which should look like:

buckets: [
  {
    key: 'South Korea<Ulsan',
    doc_count: 2
  },
  {
    key: 'France<Dordogne',
    doc_count: 2
  },
  {
    key: 'Namibia<Kunene',
    doc_count: 1
  }
]

I have read and tried node.js - Elasticsearch aggregation on part of string, not full string - Stack Overflow that seems very similar to my question, however, it did not work which I believe it was written in consideration of version less than 7.0.

Also, I have given a thought of splitting value of Site with '<', save the split result in an array, and aggregate on result 'array[1]<array[2]'.
But it seems to be creating complexity and not elasticsearch-likely according to my limited experience.

What would be the way to solve this problem?

Thank you.

1 Like

You could create a runtime field that is doing the substring calculation and use the result field for your aggregation.

Actually I couldn't make your suggestion of using runtime field work due to the elasticsearch version difference. But thank you for the reply.

Okay in older versions you have to create the new field on Index time. Which version are you running on?

On v7.2.0, elasticsearch and kibana both.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.