Composite Aggregation Sorting Based on Field Which is Not Key

mstzn · September 10, 2020, 10:31am

Hello Everyone,

I have a composite aggregation which aggregate bucket based on a term field like number field. Composite aggregation orders buckets by natural order of key.
I want to order buckets based on max date on a field.

In the below code block i dont want to order based on fromNumber order. I have a startDate field in a document. And I want to sort buckets by up-to-date of startDate field.

I there any possibility ?

      "composite": {
        "size": 20,
        "sources": [
    
          {
            "byFromNumber": {
              "terms": {
                "field": "fromNumber",
                "missing_bucket": false,
                "order":"asc"
              }
            }
          }
        ],
        "after": {
          "byFromNumber": ""
        }
      },

Mark_Harwood · September 10, 2020, 11:22am

It's one of those "it depends" answers I'm afraid.

How many shards/indices do you have?
How many unique numbers do you group on?

In a distributed system the constraints of how much data you can carry back from each shard make it complex - the same way the fox, the chicken and the grain problem is complicated by the constraint of a small boat.
This wizard walks through some of the options.

mstzn · September 10, 2020, 12:00pm

Thank you for response @Mark_Harwood

I have 1 indices and 3 shard . Unique numbers can be 500k. Here is my full es request. I just simple want to get first 20 bucket order by callStartStamp desc

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "studenId": {
              "value": 8525,
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "sort": [
    {
      "callStartStamp": {
        "order": "desc"
      }
    }
  ],
  "aggregations": {
    "fromNumberAgg": {
      "composite": {
        "size":  20,
        "sources": [
          {
            "byFromNumber": {
              "terms": {
                "field": "fromNumber",
                "missing_bucket": false,
                "order": "asc"
              }
            }
          }
        ],
        "after": {
          "byFromNumber": ""
        }
      },
      "aggregations": {
        "hits": {
          "top_hits": {
            "from": 0,
            "size": 1000,
            "version": false,
            "seq_no_primary_term": false,
            "explain": false,
            "sort": [
              {
                "callStartStamp": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "lastCallStart": {
          "max": {
            "field": "callStartStamp"
          }
        },
        "sortByCallStart": {
          "bucket_sort": {
            "sort": [
              {
                "lastCallStart": {
                  "order": "desc"
                }
              }
            ],
            "from": 0,
            "gap_policy": "SKIP"
          }
        }
      }
    }
  }
}

Mark_Harwood · September 10, 2020, 1:07pm

Looks like you want 1,000 records for each of those top 20.
However, in order to ensure accuracy of results in terms aggregations (the one you want to group on) we ask for more than 20 results from each shard. These are promising candidates for the final cut and only by merging results from multiple shards do we get towards an accurate picture of each candidate. So we throw away a lot of candiates in the final fusion - including all their <=1000 top_hits that may accompany them.
For this reason it would be better to split this into 2 queries - one to get the 20 most-recently-active callers and then a follow-up request to get just their call histories.

system · October 8, 2020, 1:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sorting results from composite aggregation Elasticsearch	14	3300	August 3, 2020
Composite Aggregation and Sorting with a non source Elasticsearch	2	1679	June 15, 2018
Composite aggregation sorting Elasticsearch	3	2603	December 14, 2018
Composite aggregation ORDER BY Elasticsearch	6	13356	August 15, 2018
Composite aggregation ordered by time over entire data in database Elasticsearch	10	1562	May 11, 2020

Composite Aggregation Sorting Based on Field Which is Not Key

Related topics