Composite Aggregation Sorting Based on Field Which is Not Key

Hello Everyone,

I have a composite aggregation which aggregate bucket based on a term field like number field. Composite aggregation orders buckets by natural order of key.
I want to order buckets based on max date on a field.

In the below code block i dont want to order based on fromNumber order. I have a startDate field in a document. And I want to sort buckets by up-to-date of startDate field.

I there any possibility ?

      "composite": {
        "size": 20,
        "sources": [
    
          {
            "byFromNumber": {
              "terms": {
                "field": "fromNumber",
                "missing_bucket": false,
                "order":"asc"
              }
            }
          }
        ],
        "after": {
          "byFromNumber": ""
        }
      },

It's one of those "it depends" answers I'm afraid.

How many shards/indices do you have?
How many unique numbers do you group on?

In a distributed system the constraints of how much data you can carry back from each shard make it complex - the same way the fox, the chicken and the grain problem is complicated by the constraint of a small boat.
This wizard walks through some of the options.

Thank you for response @Mark_Harwood

I have 1 indices and 3 shard . Unique numbers can be 500k. Here is my full es request. I just simple want to get first 20 bucket order by callStartStamp desc

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "studenId": {
              "value": 8525,
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  },
  "sort": [
    {
      "callStartStamp": {
        "order": "desc"
      }
    }
  ],
  "aggregations": {
    "fromNumberAgg": {
      "composite": {
        "size":  20,
        "sources": [
          {
            "byFromNumber": {
              "terms": {
                "field": "fromNumber",
                "missing_bucket": false,
                "order": "asc"
              }
            }
          }
        ],
        "after": {
          "byFromNumber": ""
        }
      },
      "aggregations": {
        "hits": {
          "top_hits": {
            "from": 0,
            "size": 1000,
            "version": false,
            "seq_no_primary_term": false,
            "explain": false,
            "sort": [
              {
                "callStartStamp": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "lastCallStart": {
          "max": {
            "field": "callStartStamp"
          }
        },
        "sortByCallStart": {
          "bucket_sort": {
            "sort": [
              {
                "lastCallStart": {
                  "order": "desc"
                }
              }
            ],
            "from": 0,
            "gap_policy": "SKIP"
          }
        }
      }
    }
  }
}

Looks like you want 1,000 records for each of those top 20.
However, in order to ensure accuracy of results in terms aggregations (the one you want to group on) we ask for more than 20 results from each shard. These are promising candidates for the final cut and only by merging results from multiple shards do we get towards an accurate picture of each candidate. So we throw away a lot of candiates in the final fusion - including all their <=1000 top_hits that may accompany them.
For this reason it would be better to split this into 2 queries - one to get the 20 most-recently-active callers and then a follow-up request to get just their call histories.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.