Terms aggregation is slow, setting eager_global_ordinals to True did not work

rupesh.yadav · January 8, 2024, 7:35am

I am quite new to ES so I might not know a few things. I have an Elasticsearch query_string query and with that I have an aggregate query, I am trying to get the results as fast as I can. Currently, it is taking 20s to 25s and I used profiling and I found out that the results that I am getting out of which query_string query gives result in 2s to 3s where as rest is taken by the aggregate query. Now, I tried all the suggested approach. I tried increasing the refresh_interval currently it is 5m and I even tried setting eager_global_ordinals field to true but it did not work. The cluster has 6 nodes and the index has default 5 shards. The complete corpus is around 50M+ and the query_string query has date filter of last 6 months data so usually we query the last 6months data. I can share the mapping of the aggregate fields and aggregate query and profiling results. Can anyone please help me out here?

ES version : 7.10
CPUs : 4
Memory: 8 GiB

Aggregate Query:

{
  "aggs": {
    "nested_agg": {
      "nested": {
        "path": "text.entities"
      },
      "aggs": {
        "filter": {
          "filter": {
            "terms": {
              "text.entities.type.keyword": [
                "Person",
                "Location",
                "Company",
                "Category"
              ]
            }
          },
          "aggs": {
            "type": {
              "terms": {
                "field": "text.entities.type.keyword",
                "size": 100
              },
              "aggs": {
                "entities_text1": {
                  "terms": {
                    "field": "text.entities.text.keyword",
                    "size": 100
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Mapping:

{
  "text" : {
    "properties" : {
      "entities" : {
        "type" : "nested",
        "include_in_parent" : true,
        "properties" : {
          "text" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "eager_global_ordinals" : true
              }
            },
            "eager_global_ordinals" : true
          },
          "type" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "eager_global_ordinals" : true
              }
            },
            "eager_global_ordinals" : true
          }
        }
      }
    }
  }
}

Profile results (collector, and GlobalOrdinalsStringTermsAggregator time)

        "children" : [
          {
            "type" : "GlobalOrdinalsStringTermsAggregator",
            "description" : "type",
            "time_in_nanos" : 6840094116,
            "breakdown" : {
              "reduce" : 0,
              "post_collection_count" : 1,
              "build_leaf_collector" : 7315157,
              "build_aggregation" : 58250945,
              "build_aggregation_count" : 1,
              "build_leaf_collector_count" : 36,
              "post_collection" : 1428,
              "initialize" : 7349,
              "initialize_count" : 1,
              "reduce_count" : 0,
              "collect" : 6774519237,
              "collect_count" : 4144232
            },
            "debug" : {
              "segments_with_multi_valued_ords" : 36,
              "total_buckets" : 3,
              "collection_strategy" : "remap",
              "result_strategy" : "terms",
              "segments_with_single_valued_ords" : 0,
              "has_filter" : false
            },
            "children" : [
              {
                "type" : "GlobalOrdinalsStringTermsAggregator",
                "description" : "entities_text1",
                "time_in_nanos" : 5062501112,
                "breakdown" : {
                  "reduce" : 0,
                  "post_collection_count" : 1,
                  "build_leaf_collector" : 7059084,
                  "build_aggregation" : 56881363,
                  "build_aggregation_count" : 1,
                  "build_leaf_collector_count" : 36,
                  "post_collection" : 1211,
                  "initialize" : 1542,
                  "initialize_count" : 1,
                  "reduce_count" : 0,
                  "collect" : 4998557912,
                  "collect_count" : 4144232
                },
                "debug" : {
                  "segments_with_multi_valued_ords" : 36,
                  "total_buckets" : 542401,
                  "collection_strategy" : "remap",
                  "result_strategy" : "terms",
                  "segments_with_single_valued_ords" : 0,
                  "has_filter" : false
                }
              }
            ]
          }
        ]

collector:

"collector" : [
          {
            "name" : "MultiCollector",
            "reason" : "search_multi",
            "time_in_nanos" : 12538086219,
            "children" : [
              {
                "name" : "MultiCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 271210813
              },
              {
                "name" : "MultiBucketCollector: [[nested_agg]]",
                "reason" : "aggregation",
                "time_in_nanos" : 12254584588
              }
            ]

you can see the number of buckets in this is more than 500K so I suspect that is the reason but if anyone can help me what it could be. And if you can see the collector you will find two children the first one is for the query_string query which is fast but the aggregation query is slow. Will bumping up the hardware can solve this to some extent and speed up the aggregation query? Please help me out here or if anyone can direct to something useful. Thanks!!

Mark_Tozzi · January 25, 2024, 2:10pm

First, I would suggest upgrading to a recent version of Elasticsearch. There have been a lot of performance improvements since 7.10. Second, I would try to remap your data to avoid using a nested field (and the associated nested aggregation). Nested fields are quite slow, as there is a lot of additional overhead for them. If you can use a flattened field instead, I would expect it to perform better.

system · February 22, 2024, 2:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow Terms Aggregation Elasticsearch	4	1414	May 10, 2019
Slow terms aggregations after use of eager_global_ordinals Elasticsearch	6	773	November 9, 2020
Performance issue with terms sub-aggregations when using global ordinals Elasticsearch	1	623	January 7, 2020
Help Optimizing Terms Aggregation Elasticsearch	3	569	August 30, 2018
Slow aggregations query Elasticsearch	1	452	August 28, 2017

Terms aggregation is slow, setting eager_global_ordinals to True did not work

Related topics