Hi there, I have a question around Elasticsearch's aggregation functionality. We have a use case where we need to do search with a "search term" and then group results by a field in the document and read documents within each buckets and not just bucket counts.
I have done some reading up and it seems there are 2 ways to achieve this 1. Aggregation (sub agg "top_hits" can return actual documents within buckets). 2. Collapse on a field. I am inclining towards Aggregation since 1. Aggregation returns all results in a single call unlike collapse which requires inner hits which makes additional calls to retrieve all documents within collapsed fields 2. It uses entire document set returned by search query, unlike Collapse which uses only top sorted documents and returns only representation of entire document set.
My questions are
- Can we do faceting (further bucketing within buckets, probably a candidate for sub aggregation), sorting, pagination within aggregated buckets (I know the documentation does mention that "top_hits" supports "sort","from" and "size" param but want to validate it) ?
- Do you see any major query performance bottlenecks with aggregation approach?
- Any way other than aggregation and collapse by which this can be achieved? The last option for me would be make as many ES queries as there are values of the group by field. There are atleast 4 different values of group by field so at minimum, there would be 4 different queries to ES to get documents for each value of the group by field and then stitch all documents together ( still grouped by group by field)