Terms aggregation is slow, setting eager_global_ordinals to True did not work

I am quite new to ES so I might not know a few things. I have an Elasticsearch query_string query and with that I have an aggregate query, I am trying to get the results as fast as I can. Currently, it is taking 20s to 25s and I used profiling and I found out that the results that I am getting out of which query_string query gives result in 2s to 3s where as rest is taken by the aggregate query. Now, I tried all the suggested approach. I tried increasing the refresh_interval currently it is 5m and I even tried setting eager_global_ordinals field to true but it did not work. The cluster has 6 nodes and the index has default 5 shards. The complete corpus is around 50M+ and the query_string query has date filter of last 6 months data so usually we query the last 6months data. I can share the mapping of the aggregate fields and aggregate query and profiling results. Can anyone please help me out here?

ES version : 7.10
CPUs : 4
Memory: 8 GiB

Aggregate Query:

{
  "aggs": {
    "nested_agg": {
      "nested": {
        "path": "text.entities"
      },
      "aggs": {
        "filter": {
          "filter": {
            "terms": {
              "text.entities.type.keyword": [
                "Person",
                "Location",
                "Company",
                "Category"
              ]
            }
          },
          "aggs": {
            "type": {
              "terms": {
                "field": "text.entities.type.keyword",
                "size": 100
              },
              "aggs": {
                "entities_text1": {
                  "terms": {
                    "field": "text.entities.text.keyword",
                    "size": 100
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Mapping:

{
  "text" : {
    "properties" : {
      "entities" : {
        "type" : "nested",
        "include_in_parent" : true,
        "properties" : {
          "text" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "eager_global_ordinals" : true
              }
            },
            "eager_global_ordinals" : true
          },
          "type" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "eager_global_ordinals" : true
              }
            },
            "eager_global_ordinals" : true
          }
        }
      }
    }
  }
}

Profile results (collector, and GlobalOrdinalsStringTermsAggregator time)

        "children" : [
          {
            "type" : "GlobalOrdinalsStringTermsAggregator",
            "description" : "type",
            "time_in_nanos" : 6840094116,
            "breakdown" : {
              "reduce" : 0,
              "post_collection_count" : 1,
              "build_leaf_collector" : 7315157,
              "build_aggregation" : 58250945,
              "build_aggregation_count" : 1,
              "build_leaf_collector_count" : 36,
              "post_collection" : 1428,
              "initialize" : 7349,
              "initialize_count" : 1,
              "reduce_count" : 0,
              "collect" : 6774519237,
              "collect_count" : 4144232
            },
            "debug" : {
              "segments_with_multi_valued_ords" : 36,
              "total_buckets" : 3,
              "collection_strategy" : "remap",
              "result_strategy" : "terms",
              "segments_with_single_valued_ords" : 0,
              "has_filter" : false
            },
            "children" : [
              {
                "type" : "GlobalOrdinalsStringTermsAggregator",
                "description" : "entities_text1",
                "time_in_nanos" : 5062501112,
                "breakdown" : {
                  "reduce" : 0,
                  "post_collection_count" : 1,
                  "build_leaf_collector" : 7059084,
                  "build_aggregation" : 56881363,
                  "build_aggregation_count" : 1,
                  "build_leaf_collector_count" : 36,
                  "post_collection" : 1211,
                  "initialize" : 1542,
                  "initialize_count" : 1,
                  "reduce_count" : 0,
                  "collect" : 4998557912,
                  "collect_count" : 4144232
                },
                "debug" : {
                  "segments_with_multi_valued_ords" : 36,
                  "total_buckets" : 542401,
                  "collection_strategy" : "remap",
                  "result_strategy" : "terms",
                  "segments_with_single_valued_ords" : 0,
                  "has_filter" : false
                }
              }
            ]
          }
        ]

collector:

"collector" : [
          {
            "name" : "MultiCollector",
            "reason" : "search_multi",
            "time_in_nanos" : 12538086219,
            "children" : [
              {
                "name" : "MultiCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 271210813
              },
              {
                "name" : "MultiBucketCollector: [[nested_agg]]",
                "reason" : "aggregation",
                "time_in_nanos" : 12254584588
              }
            ]

you can see the number of buckets in this is more than 500K so I suspect that is the reason but if anyone can help me what it could be. And if you can see the collector you will find two children the first one is for the query_string query which is fast but the aggregation query is slow. Will bumping up the hardware can solve this to some extent and speed up the aggregation query? Please help me out here or if anyone can direct to something useful. Thanks!!

First, I would suggest upgrading to a recent version of Elasticsearch. There have been a lot of performance improvements since 7.10. Second, I would try to remap your data to avoid using a nested field (and the associated nested aggregation). Nested fields are quite slow, as there is a lot of additional overhead for them. If you can use a flattened field instead, I would expect it to perform better.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.