I would like to aggregate documents on some terms, but I only care to know if that term is there for at least 1 document, and absolutely do not care about the count of matching documents.
What would be a good way to implement this? A custom metric bucket? I could easily see a boolean check in a reduce operation.
The advantage over proper counting would be to spare some computations.
Do you know the terms that you want to check in advance? If yes you could just add these terms to a FILTER clause and then use
terminate_after=1 in order to stop processing the request after the first match.
Yes, I do.
Wow, thanks! I think that will work, I will engineer the aggregation filter so that it uses terminate_after
Actually, aggregation filter will not help, I was more thinking of using one search request per term that you want to check the presence of (you can put them all in a single multi-search request to save round trips).
// your query
You can check wether there are documents that match both your query and
term_to_check by looking ot whether this query has a total number of hits that is greater than 0.
I just noticed that terminate_after applies to all aggregations, not specific ones, so that wouldn't help.
Thanks for your second reply; I am afraid this second-query approach wouldn't work as the computation for the terms is scripted and expensive; I will use a regular aggregation for the time being, and maybe later on I will try some scripted metric aggregation if I can wrap my head around it.
I will use a global bucket for this, and since I am using function score I hope it will be re-used across documents, thus allowing me to save the computations