Results with unique value in field?

Using ES 5.2. I've indexed activities (classes, local events, etc) that are put on by businesses. Along these lines:

{
  "name": "Johnny's BBQ Fundraiser",
  "price": "$15/plate",
  "date": "2016-04-28 12:00:00",
  "location": "City Hall",
  "business": {
    "id": 423,
    "name": "Johnny's Smokey BBQ Restaurant"
  }
}

There are many different businesses, and each business runs (owns) many different activities.

I want to query for results such that each result is from a unique business.id. How would I go about doing this?

Currently I'm inefficiently looping:

  • query ES for 30 results (I'm only looking for 16)
  • taking the first activity for each encountered business.id, temporarily store the encountered business.id in a list, skipping other activities in the loop with already encountered business.ids
  • on the subsequent loops, I query for another 30 results excluding (via bool : must_not : terms filter) previously encountered business.ids from the list

This is terrible. The way our data is structured and how businesses interact with our site means that there are sometimes 5 or 6 loops, just to get 16 results unique by business.id.

There must be a better way.

Hey,

not a hundred percent sure, if this is what you want, but I think the next minor version release (Elasticsearch 5.3) has exactly, what you are looking for. The feature is called field collapsing for search, see the docs

https://www.elastic.co/guide/en/elasticsearch/reference/5.x/search-request-collapse.html

--Alex

1 Like

That is interesting - thanks Alexander.

I've been looking into Aggregations - specifically if I bucket by business.id and then have a top-hit agg with size=1... might this work for my scenario? It does mean I need to parse my data from aggregations rather than just 'normal' results (which is what I'd obviously prefer - and looks like collapse does).

Edit: In fact, I'd almost guess that is what collapse does, internally... if so, intriguing!

Hey,

yes, that could be a valid workaround for now, but the new feature should be faster than aggs, if I recall correctly.

--Alex

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.