Composite aggregation on runtime field in 6.2

strutdepot-discsn · July 18, 2021, 1:15pm

I have "city", "state" and "country" fields in the index and need to calculate the document count by each city. The index contains more than 2 million documents.

This was my first stab at it:

get students/_search
{
  "size": 0,
  "aggs": {
    "country": {
      "terms": {
        "size": 1000,
        "field": "country.keyword"
      },
      "aggs": {
        "state": {
          "terms": {
            "size": 50,
            "field": "state.keyword"
          },
          "aggs": {
            "city": {
              "terms": {
                "size": 3,
                "field": "city.keyword"
              }
            }
          }
        }
      }
    }
  }
}

This works. But there was a Kibana warning:

#! Deprecation: This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests.

As a side note, I initially ran the above query with a "size" of 1000 for both "state" and "city" aggregations. The development server ran out of heap space. So was forced to reduce the numbers seen in the query.

Going back to the warning, it sounds like I would switch to a composite query for paginating through the buckets. So I attempted to switch the query to a composite aggregation. But have had limited success.

Attempt 1

GET students/_search
{
  "size": 0,
  "aggs": {
    "country_state_city": {
      "composite": {
        "sources": [
          {
            "product": {
              "terms": {
                "field": "state.keyword"
              }
            }
          },
          {
            "product": {
              "terms": {
                "field": "city.keyword"
              }
            }
          },
          {
            "product": {
              "terms": {
                "field": "country.keyword"
              }
            }
          }
        ]
      }
    }
  }
}

This gave the following result:

"aggregations": {
    "country_state_city": {
      "buckets": [
        {
          "key": {
            "product": "Greece"
          },
          "doc_count": 1
        },
        {
          "key": {
            "product": "Italy"
          },
          "doc_count": 1
        },
        {
          "key": {
            "product": "France"
          },
          "doc_count": 1
        },
        {
          "key": {
            "product": "United States"
          },
          "doc_count": 1
        },

I know for a fact that "United States" has more than one document. After reading the documentation, I am not entirely sure if the query was coined correctly.

Attempt 2

GET students/_search
{
  "size": 0,
  "track_total_hits" : false,
  "aggs": {
    "country_state_city": {
      "composite": {
        "size": 100,
        "sources": [
          {
            "country_state_city": {
              "terms": {
                "script": {
                  "source": "def country= doc['country.keyword'].value; country = (null == country ? new String() : country); def state = doc['state.keyword'].value; state = (null == state ? new String() : ', ' + state); def city = doc['city.keyword'].value; city = (null == city ? new String() : ', ' + city); country + state + city; ",
                  "lang": "painless"
                },
                "order": "desc"
              }
            }
          }
        ]
      }
    }
  }
}

The latest ES version supports runtime fields. But 6.2 (that I am stuck with) does not appear to support this (at least from what I can tell).

Some documents have null values for country, state and city. Hence the null check.

The above query gave the following result:

  "aggregations": {
    "country_state_city": {
      "buckets": [
        {
          "key": {
            "country_state_city": "Åland Islands, Mariehamn"
          },
          "doc_count": 6
        },
        {
          "key": {
            "country_state_city": "Åland Islands, Geta"
          },
          "doc_count": 1
        },
        {
          "key": {
            "country_state_city": "Zimbabwe, Victoria Falls"
          },
          "doc_count": 31
        },

This again is wrong (the actual numbers are higher).

I am wondering what I could be doing wrong?

My Elasticsearch version is 6.2.4. Upgrading the cluster is not an option (operations team is too busy).

system · August 15, 2021, 1:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Composite aggregation ORDER BY Elasticsearch	6	13357	August 15, 2018
ES aggregation query Elasticsearch	5	316	November 23, 2020
Composite aggregation buckets total count Elasticsearch	2	4180	July 24, 2018
Counting amount of buckets in aggregation Elasticsearch painless , aggregations	2	39	August 1, 2024
Composite aggregation max bucket size Elasticsearch	2	7785	December 10, 2018

Composite aggregation on runtime field in 6.2

Related topics