ElasticSearch aggregate on a scripted nested field

doogyatnesta · March 4, 2022, 11:23am

I have the following mapping in my Elasticsearch index (simplified as the other fields are irrelevant:

{
  "test": {
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

The data looks like this (again simplified):

[
  {
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  },
  {
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  },
  {
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }
]

I'm trying perform a bucket aggregation on the maximum value of float_property for each document. So for the example above, the following would be the desired response:

...
{
  "buckets": [
    {
      "key": "0.9",
      "doc_count": 2
    },
    {
      "key": "0.6",
      "doc_count": 1
    }
  ]
}

as doc a's highest nested value for float_property is 0.6, b's is 0.9 and c's is 0.9.

I've tried using a mixture of nested and aggs, along with runtime_mappings, but I'm not sure in which order to use these.

doogyatnesta · March 4, 2022, 5:16pm

I've managed to figure this out in the end.

The two things I hadn't realised were:

You can provide a script instead of a field key to bucket aggregations.
Instead of using nested queries, you can access nested values directly using params._source.

The combination of these two things allowed me to write the correct query:

{
  "size": 0,
  "aggs": {
    "max.float_property": {
      "terms": {
        "script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
      }
    }
  }
}

Response:

{
  ...
  "aggregations": {
    "max.float_property": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0.9",
          "doc_count": 2
        },
        {
          "key": "0.6",
          "doc_count": 1
        }
      ]
    }
  }
}

I'm confused though, because I thought the correct way to access nested fields was by using the nested query type. Unfortunately there's very little documentation for this, so I'm still unsure if this is the intended/correct way to aggregate on scripted nested fields.

Tomo_M · March 6, 2022, 10:32am

In my opinion, as nested fields are separated Lucene documents, they are not stored as doc values. So reconstructing from source field is the only way to accessing the nested fields in painless script.

Of cource you can make the script reusable by runtime mappings of the index or do the calculation during indexing using ingest pipeline.

Topic		Replies	Views
Nested aggregation Elasticsearch	1	548	August 14, 2017
How to get aggregated data by a field from ES with two values accumulated Elasticsearch	0	532	April 16, 2018
Aggregation not working on Scripted field Elasticsearch	1	1201	September 6, 2017
Bucketed aggregation on nested fields Elasticsearch	0	322	November 9, 2020
Writing nested aggregation query in NEST Elasticsearch	1	600	May 8, 2019

ElasticSearch aggregate on a scripted nested field

Related topics