Bucket sort aggregation with actual documents

Hi team,

I wanted to get aggegatated data with pagination. So I have tried to use Bucket-Sort. I have referred this URL.

Now, I am able to get aggegatated data, but the problem is, I am able to get the only the key,aggegation and document count field in the Search response.

My query :

{
    "from": 0,
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "name:hello*^10000.0",
                    }
                }
            ],
            "filter": [
                {
                    "terms": {
                        "locale": [
                            "XYZ",
                            "ABC"
                        ],
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "aggregations": {
        "groupbyid": {
            "terms": {
                "field": "groupbyid.raw",
                "size": 10000
            },
            "aggs": {
                "test_bucket_sort": {
                    "bucket_sort": {
                        "size": 2,
                        "from": 3
                    }
                }
            }
        }
    }
}

The response which I get:

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 4,
        "successful": 4,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1311,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "groupbyid": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "id1",
                    "doc_count": 3
                },
                {
                    "key": "id3",
                    "doc_count": 3
                }
            ]
        }
    }
}

I wanted in each bucket all the 3 document source also. I have tried couple of things but seems it did not work.

I got this error when I have tried to use top_hits with include parameter

[bucket_sort] unknown field [_source]

I am on elasticsearch 7.7 version.

Please help me out of this.

Thanks in advance.

I suggest to look into composite aggregation, you can implement groupbyid as values source. I am not sure what you need bucket sort for, if you need sorting, you can sort in the composite aggregation. For retrieving the source of all documents in the bucket you can use scripted_metric, e.g.

"all_docs": {
  "scripted_metric": {
    "init_script": "state.docs = []",
    "map_script": "state.docs.add(new HashMap(params['_source']))",
    "combine_script": "return state.docs",
    "reduce_script": "def docs = []; for (s in states) {for (d in s) { docs.add(d);}}return docs"
  }
}

But be careful, if you have a lot of docs in a bucket, this can cause memory explosion.

Hi @Hendrik_Muhs,

Thank you for your reply. I wanted to use pagination + aggregation. So have used bucket sort.

Right now, was thinking to use composite aggregation as this using script, could be an expensive operation as you said.

Thanks