There are many different businesses, and each business runs (owns) many different activities.
I want to query for results such that each result is from a unique business.id. How would I go about doing this?
Currently I'm inefficiently looping:
query ES for 30 results (I'm only looking for 16)
taking the first activity for each encountered business.id, temporarily store the encountered business.id in a list, skipping other activities in the loop with already encountered business.ids
on the subsequent loops, I query for another 30 results excluding (via bool : must_not : terms filter) previously encountered business.ids from the list
This is terrible. The way our data is structured and how businesses interact with our site means that there are sometimes 5 or 6 loops, just to get 16 results unique by business.id.
not a hundred percent sure, if this is what you want, but I think the next minor version release (Elasticsearch 5.3) has exactly, what you are looking for. The feature is called field collapsing for search, see the docs
I've been looking into Aggregations - specifically if I bucket by business.id and then have a top-hit agg with size=1... might this work for my scenario? It does mean I need to parse my data from aggregations rather than just 'normal' results (which is what I'd obviously prefer - and looks like collapse does).
Edit: In fact, I'd almost guess that is what collapse does, internally... if so, intriguing!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.