Slow aggregation no matter the size of the result set

Hi,

We're experiencing a strange behavior with an aggregation.

The below query is taking 1 - 1.5 seconds to execute.

GET index/type/_search
{
  "size": 0, 
  "query": {
     "bool": {
       "must": [
		   {
			  "term": {
				 "query": {
					"value": "some search query"
				 }
			  }
		   }
       ]
	}
  },
  "aggs": {
     "queries": {
         "terms": {
           "field": "query.keyword",
           "size": 10
         }
     }
  }
}

After a thorough investigation, we've found that the slowness comes from the aggregation on this specific field - query.keyword. The interesting thing here is that the aggregation is slow no matter if the size of the result-set is 40,000 or 1 document - the runtime is exactly the same.

However, you can get fast results with the aggregation if you are aggregating on a different field, for example title (also keyword, see mapping below) . So it seems to be something related to the query field and/or it's subfields.

Here are the mappings:

...
         "query": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
         "title": {
            "type": "keyword"
          }
...

The index shard setup is 2p+1r, with a total of 9,992,927 documents.
Elasticsearch 5.3.2

Some statistics:

Query 1 - without aggregation
Returned docs: 1
Execution time: 34ms

Query 2 - without aggregation
Returned docs 40,000
Execution time: 35ms


Query 1 - with aggregation on query.keyword
Returned docs: 1
Execution time: 1427ms

Query 2 - with aggregation on query.keyword
Returned docs: 40,000
Execution time: 1350ms


Query 1 - with aggregation on title
Returned docs: 1
Execution time: 56ms

Query 2 - with aggregation title
Returned docs: 40,000
Execution time: 63ms


Question:

So the question is why aggregating on query.keyword is slowing down the query no matter what size the result set is filtered down to. The aggregation should only run on the result-set, right?!?

Does anyone have any thoughts on this or how I debug it further?

Try execution hint of map on the terms agg.

1 Like

Thank you very much @Mark_Harwood! This solved the problem. :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.