Very different aggregation speed for similar fields with different cardinality

Elastic 2.3.3

I have the following mapping, in total there are 500m docs loaded.

{
  "someindex": {
    "mappings": {
      "sometype": {
        "properties": {
          "field1": {
            "type": "integer"
          },
          "field2": {
            "type": "string",
            "index": "not_analyzed"
          },
          "field3": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}

This query performs extremely fast, no problems with it at all:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "field1": N
                    }
                }
            ]
        }
    },
    "size": 0
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 107,
    "max_score": 0.0,
    "hits": []
  }
}

This one works fine two:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "field1": N
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "by_field2": {
            "terms": {
                "field": "field2",
                "size": 1000
            }
        }
    }
}

{
  "took": 201,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 107,
    "max_score": 0.0,
    "hits": []
  },
  "aggregations": {
    "by_field2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
          ...
      ]
    }
  }
}

But this one works much longer than 60 seconds:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "field1": N
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "by_field3": {
            "terms": {
                "field": "field3",
                "size": 1000
            }
        }
    }
}

Cardinality of field3 is much larger than field2's, but I don't see the point why elastic requires more than a minute to process 107 results. Can you please show me the reason for such behavior or any useful tip to fix the query?

Anything in logs?

If you upgrade to 5,2, you can use Profile API which works as well with aggs. Even better with the basic license of XPack (free) you can visualize that in Kibana dev tools.

You can also run a hot_threads call to see what is taking time.

BTW what is your index settings, size... nb of nodes...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.