One-hit buckets

(gm42) #1


I would like to aggregate documents on some terms, but I only care to know if that term is there for at least 1 document, and absolutely do not care about the count of matching documents.

What would be a good way to implement this? A custom metric bucket? I could easily see a boolean check in a reduce operation.

The advantage over proper counting would be to spare some computations.

(Adrien Grand) #2

Do you know the terms that you want to check in advance? If yes you could just add these terms to a FILTER clause and then use terminate_after=1 in order to stop processing the request after the first match.

(gm42) #3

Yes, I do.

Wow, thanks! I think that will work, I will engineer the aggregation filter so that it uses terminate_after

(Adrien Grand) #4

Actually, aggregation filter will not help, I was more thinking of using one search request per term that you want to check the presence of (you can put them all in a single multi-search request to save round trips).

Something like

GET _search?terminate_after=1&size=0
  "query": {
    "bool": {
      "must": [
        // your query
      "filter": [
          "term": {
            "field_to_check": "term_to_check"

You can check wether there are documents that match both your query and term_to_check by looking ot whether this query has a total number of hits that is greater than 0.

(gm42) #5

I just noticed that terminate_after applies to all aggregations, not specific ones, so that wouldn't help.

Thanks for your second reply; I am afraid this second-query approach wouldn't work as the computation for the terms is scripted and expensive; I will use a regular aggregation for the time being, and maybe later on I will try some scripted metric aggregation if I can wrap my head around it.

(gm42) #6

I will use a global bucket for this, and since I am using function score I hope it will be re-used across documents, thus allowing me to save the computations

(system) #7