We want to utilize the search_after pagination strategy rather than from-size pagination strategy as we understood that search_after has better performance for high pagination numbers.
At the same time, we do collapse our query results base on specific field. We are aware of the limitation that search_after can't be applied when collapse is utilized.
Question A: Based on https://github.com/elastic/elasticsearch/issues/53115, composite aggregation can be used as an alternative to from-size pagination - do I understand correctly that composite aggregation is used to retrieve page of "record ids", and second query is used to fetch records by ids returned from aggregation?
Question B: Are there other, performance wise better pagination approaches than from-size, that can be used together with field collapsing?
No, the challenge with doing any form of distributed aggregation with deep pagination is getting remote shards to independently agree on which subset of terms (not docs/doc IDs) to focus on with each request.
For scenarios where you want to do deep pagination there's broadly 3 approaches:
Composite agg - all shards work through results in order of a given grouping key (typically a field's terms e.g. customerID)).
Terms agg with partitioning - each shard works on an algorithmically determined subset of terms (uses the same sort of hash+modulo N partitioning approach used to evenly and deterministically index docs across shards by id)
Transforms API - fuses related docs together at index-time rather than the above query-time approaches.
This wizard can help walk through the options in more detail
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.