Sorting aggregation buckets on string field

rikard · June 3, 2015, 2:52pm

I'm trying to group documents on a field and then sort them on another string field. My aim is to only get one document per group and sort them.

For example, let's say the mapping looks like (simplified):

{
..
  "group" : {"type" : "string"}
  "title" : {"type" : "string"}
}

I want to group documents on group, then sort on title. For this I am then using a terms aggregation with field="group". For sorting I am using ordering based on a sub aggregation. For numeric values this works fine by using min and max aggregations, but I can't find anything similar for string ordering. Any ideas?

This is the code for numeric sorting:

AbstractAggregationBuilder bucketsAgg = terms("group_agg_name")
        .field("group")
        .order(Terms.Order.aggregation("order_agg_name", false))
        .subAggregation(
                max("order_agg_name").field("numeric")
        )
        .subAggregation(
                topHits ...

        );

colings86 · June 3, 2015, 3:05pm

Sorting on non numeric metric aggregations is not currently supported.

However, if I understand your request correctly you are trying to sort the group buckets based on a property of an individual document (title). This would not work even if string sorting was supported since multiple documents can fall into a bucket and ES would not know which document's title to use for sorting (or how to combine the document's titles to produce a sorting value).

rikard · June 4, 2015, 7:36am

Ok, thanks. Can you think of any other way of removing duplicates (duplicate based on some property in the document) other than aggregations?

We are having problem with performance. We have a query going from 250ms to 9000ms when using group aggregations to remove duplicate entries in search results. Looking in to filters or post filters as perhaps a possible way.

colings86 · June 4, 2015, 11:26am

Could you explain a bit more about your use-case and what you are trying to achieve? Having some example documents, and queries in a cURL recreation in a gist or something might also help as well

rikard · June 5, 2015, 10:10am

Yes, one case we have is searching for music tracks, were you search for a track title. These tracks may contain different versions which should not be shown multiple times. To filter out the duplicates we do a group aggregation on a grouping string for the track. The number of tracks is about 30 million.

I could give you some documents etc, but I see now that the problem seems to be using a string for grouping. When I use a integer type for example i get response quickly. Should I avoid using string for field and term aggregations? Seems really strange. I have to test more.

rikard · June 5, 2015, 11:42am

Or rather, the number of unique values for the field I am grouping seems to decide how long the aggregation takes. I thought my aggregation was done only on the search result, and in that case I don't think that this should have such a big effect, but maybe I am not aggregating on all data or something.

For example, this request should only aggregate on the search result, right?

{
  "from": 0,
  "size": 10,
  "query": {
    "match": {
      "title": {
        "query": "Thriller",
        "type": "boolean"
      }
    }
  },
  "aggregations": {
    "group_agg": {
      "terms": {
        "field": "group",
        "size": 10,
        "order": {
          "order_agg": "desc"
        }
      },
      "aggregations": {
        "order_agg": {
          "max": {
            "script": "_score"
          }
        },
        "group_agg_top_hit": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

rikard · June 5, 2015, 12:42pm

Update on this.
I read this: https://github.com/elastic/elasticsearch/issues/5498 and added "executionHint("map")" to the terms aggregator and that solved the problem it seems.

Topic		Replies	Views
Sorting results from composite aggregation Elasticsearch	14	3300	August 3, 2020
Is there is a way to sort terms aggregation by string field? Elasticsearch	2	1145	July 5, 2017
How to sort keyword/text fields on aggregation results Elasticsearch	1	209	April 28, 2022
Sort over Aggregations buckets by text field over already sorted result Elasticsearch	0	8	September 10, 2024
Sorting, Paginating the aggregated data Elasticsearch	12	765	August 6, 2020

Sorting aggregation buckets on string field

Related topics