Broken track_total_hits behaviour in aggregation

spinscale · April 10, 2025, 3:57pm

Hey,

I am running an aggregation only request, that looks like this:

GET product_data/_search?request_cache=false&terminate_after=500000
{
  "timeout": "300ms", 
  "track_total_hits": true, 
  "size": 0, 
  "query": {
    "bool": {
      "filter": [
        {
          "simple_query_string": {
            "query": "buch",
            "default_operator": "AND",
            "fields": [
              "category_names"
            ]
          }
        }
        // few more filters here
      ]
    }
  },
  "aggs": {
    "category_id": {
      "terms": {
        "field": "category_id"
      }
    }
  }
}

I terminate after 500k documents to prevent parsing millions of documents. Running this request with track total hits enabled takes roughly 100ms, however when disabling track total hits or leaving it out completely the runtime exceeds the 300ms timeout and takes about 380ms.

Also the responses because it seems that disabled track total hits does not honor terminate after. This is with track total hits set to true

{
  "took": 96,
  "timed_out": false,
  "terminated_early": true,
  "_shards": {
    "total": 4,
    "successful": 4,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2000000,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "categoryId": {
      "doc_count_error_upper_bound": 16008,
      "sum_other_doc_count": 929766,
      "buckets": [
        {
          "key": "A
          "doc_count": 393317
        },
        {
          "key": "B
          "doc_count": 247372
        },
        {
          "key": "C
          "doc_count": 221628
        },
        {
          "key": "D",
          "doc_count": 207917
        }
      ]
    }
  }
}

This is, when not being set or set to false

{
  "took": 424,
  "timed_out": false,
  "terminated_early": false,
  "_shards": {
    "total": 4,
    "successful": 4,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "categoryId": {
      "doc_count_error_upper_bound": 79596,
      "sum_other_doc_count": 4039869,
      "buckets": [
        {
          "key": "A",
          "doc_count": 2200027
        },
        {
          "key": "B",
          "doc_count": 1055606
        },
        {
          "key": "C",
          "doc_count": 934889
        },
        {
          "key": "D",
          "doc_count": 881217
        }
      ]
    }
  }
}

You can see the counts are clearly exceeding the maximum expected of 2 million. Is there some block max WAND optimization that does not work properly?

This is on Elasticssearch 8.14.1.

Is this a known bug, or anything I can do? On top of my head setting track_total_hits: true on aggregating queries should not have any negative effect if everything works as expected, but some confirmation would be great.

Thanks for any hints what is happening here.

Have a great week!

--Alex

spinscale · April 11, 2025, 7:02am

Testing this under 8.17.4 (unfortunately a local single node, so I cannot do performance comparisons):

The terminate_after flag is not honored at all, execution always returns the high number of documents, far more than the specified number in terminate_after . Execution time is always the same, but I don't know if that sped up the query or if my local system just has more cores and resources and thus this might still be slow on the current version...

Ignacio_Vera · April 11, 2025, 8:49am

Hello!

Yes, you are right, I open terminate_after is not honour when size = 0 and query has aggregations · Issue #126665 · elastic/elasticsearch · GitHub

Ignacio_Vera · April 11, 2025, 8:50am

Thinking about your use case, if what you want is not running your aggregations on the whole dataset, you might want to consider using the sampler aggregation instead?

spinscale · April 11, 2025, 8:55am

Hey Ignacio ,

so far I managed to work without scoring, because the query does not have any scoring part. In this case the sampler agg would act like terminate_after I guess.

I just tried running the sampler agg and it is much slower than terminate_after on 8.14.

--Alex

Topic		Replies	Views
Aggregation fails silently, returns the default response of 10 docs and hits instead of error Elasticsearch	2	92	July 23, 2024
Terminate_after with aggregation Elasticsearch	1	543	March 23, 2021
Maximum number of documents considered for aggregation Elasticsearch	3	587	July 5, 2017
A bout early termination of search request -- track_total_hits Elasticsearch	1	684	January 12, 2018
What is the meaning of "hits.total" in an aggregate search query? Elasticsearch	1	537	September 2, 2021

Broken track_total_hits behaviour in aggregation

Related topics