Hello,
I am using ElasticSearch aggregations to group results. More specifically, I am using a composite aggregation at the top level and then using a bucket selector sub-aggregation to determine what buckets are to be returned from the top level aggregation.
My question is how to implement pagination here? My query looks as follows:
{
"size": 0,
"aggs": {
"Project_aggregation": {
"composite": {
"size": 10,
"sources": [{
"terms_aggregation": {
"terms": {
"field": "ProjectID"
}
}
}]
},
"aggs": {
"latest_project_timestamp": {
"max": {
"field": "ProjectStartTime"
}
},
"project_which_satisfies_filters": {
"filter": {
"match": {
"Category": "Small"
}
},
"aggs": {
"start_time": {
"max": {
"field": "ProjectStartTime"
}
}
}
},
"find_if_latest_project_is_same_as_project_which_satisfies_filters": {
"bucket_selector": {
"buckets_path": {
"LatestProjectStartTime": "latest_project_timestamp",
"ProjectWhichSatisfiesFiltersStartTime": "project_which_satisfies_filters>start_time"
},
"script": "params.CurrentProjectStartTime == params.ProjectWhichSatisfiesFiltersStartTime"
}
}
}
}
}
}
If I set page size to 10 and then the bucket selector weeds out 4 buckets, I will have only 6 buckets left. Sometimes I will get empty results with an after key and I have to loop several times to get the next set of results.
I can use the above solution to do several queries until I get the desired bucket count, however this is not optimal.
If there is an alternate solution to the above query, that would be useful too, basically for each project ID I want to retrieve only the latest document. Assuming the latest document does not satisfy certain filters, it should not be returned. Please let me know if there is a way to avoid using the bucket selector aggregation which is causing the issue with having to do multiple queries.
Thanks,
Raonic