One-hit buckets

gm42 · September 20, 2016, 8:32am

Scenario:

I would like to aggregate documents on some terms, but I only care to know if that term is there for at least 1 document, and absolutely do not care about the count of matching documents.

What would be a good way to implement this? A custom metric bucket? I could easily see a boolean check in a reduce operation.

The advantage over proper counting would be to spare some computations.

jpountz · September 20, 2016, 9:54am

Do you know the terms that you want to check in advance? If yes you could just add these terms to a FILTER clause and then use terminate_after=1 in order to stop processing the request after the first match.

gm42 · September 20, 2016, 9:55am

Yes, I do.

Wow, thanks! I think that will work, I will engineer the aggregation filter so that it uses terminate_after

jpountz · September 20, 2016, 10:18am

Actually, aggregation filter will not help, I was more thinking of using one search request per term that you want to check the presence of (you can put them all in a single multi-search request to save round trips).

Something like

GET _search?terminate_after=1&size=0
{
  "query": {
    "bool": {
      "must": [
        // your query
      ],
      "filter": [
        {
          "term": {
            "field_to_check": "term_to_check"
          }
        }
      ]
    }
  }
}

You can check wether there are documents that match both your query and term_to_check by looking ot whether this query has a total number of hits that is greater than 0.

gm42 · September 20, 2016, 11:45am

I just noticed that terminate_after applies to all aggregations, not specific ones, so that wouldn't help.

Thanks for your second reply; I am afraid this second-query approach wouldn't work as the computation for the terms is scripted and expensive; I will use a regular aggregation for the time being, and maybe later on I will try some scripted metric aggregation if I can wrap my head around it.

gm42 · September 20, 2016, 2:01pm

I will use a global bucket for this, and since I am using function score I hope it will be re-used across documents, thus allowing me to save the computations

Topic		Replies	Views
Filter buckets after top_hits aggregation Elasticsearch	12	1573	June 22, 2020
Mathematical operations on individual bucket elemenst Elasticsearch	8	1888	March 10, 2017
Limiting the number of documents for each bucket in term aggregation Elasticsearch	3	534	September 22, 2021
Terminate_after with aggregation Elasticsearch	1	516	March 23, 2021
Filter out terms aggregation buckets in elastic search after applying aggregation Elasticsearch	1	254	May 5, 2021

One-hit buckets

Related topics