I would like to aggregate documents on some terms, but I only care to know if that term is there for at least 1 document, and absolutely do not care about the count of matching documents.
What would be a good way to implement this? A custom metric bucket? I could easily see a boolean check in a reduce operation.
The advantage over proper counting would be to spare some computations.
Do you know the terms that you want to check in advance? If yes you could just add these terms to a FILTER clause and then use terminate_after=1 in order to stop processing the request after the first match.
Actually, aggregation filter will not help, I was more thinking of using one search request per term that you want to check the presence of (you can put them all in a single multi-search request to save round trips).
You can check wether there are documents that match both your query and term_to_check by looking ot whether this query has a total number of hits that is greater than 0.
I just noticed that terminate_after applies to all aggregations, not specific ones, so that wouldn't help.
Thanks for your second reply; I am afraid this second-query approach wouldn't work as the computation for the terms is scripted and expensive; I will use a regular aggregation for the time being, and maybe later on I will try some scripted metric aggregation if I can wrap my head around it.
I will use a global bucket for this, and since I am using function score I hope it will be re-used across documents, thus allowing me to save the computations
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.