Aggregation Pagination

Hello,

I am using ElasticSearch aggregations to group results. More specifically, I am using a composite aggregation at the top level and then using a bucket selector sub-aggregation to determine what buckets are to be returned from the top level aggregation.

My question is how to implement pagination here? My query looks as follows:

{
    "size": 0,
    "aggs": {
        "Project_aggregation": {
            "composite": {
                "size": 10,
                "sources": [{
                    "terms_aggregation": {
                        "terms": {
                            "field": "ProjectID"
                        }
                    }
                }]
            },
            "aggs": {
                "latest_project_timestamp": {
                    "max": {
                        "field": "ProjectStartTime"
                    }
                },
                "project_which_satisfies_filters": {
                    "filter": {
                        "match": {
                            "Category": "Small"
                        }
                    },
                    "aggs": {
                        "start_time": {
                            "max": {
                                "field": "ProjectStartTime"
                            }
                        }
                    }
                },
                "find_if_latest_project_is_same_as_project_which_satisfies_filters": {
                    "bucket_selector": {
                        "buckets_path": {
                            "LatestProjectStartTime": "latest_project_timestamp",
                            "ProjectWhichSatisfiesFiltersStartTime": "project_which_satisfies_filters>start_time"
                        },
                        "script": "params.CurrentProjectStartTime == params.ProjectWhichSatisfiesFiltersStartTime"
                    }
                }
            }
        }
    }
}

If I set page size to 10 and then the bucket selector weeds out 4 buckets, I will have only 6 buckets left. Sometimes I will get empty results with an after key and I have to loop several times to get the next set of results.

I can use the above solution to do several queries until I get the desired bucket count, however this is not optimal.

If there is an alternate solution to the above query, that would be useful too, basically for each project ID I want to retrieve only the latest document. Assuming the latest document does not satisfy certain filters, it should not be returned. Please let me know if there is a way to avoid using the bucket selector aggregation which is causing the issue with having to do multiple queries.

Thanks,
Raonic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.