We recently use ES CompositeAggregation to generate a report, which requires sorting and the result (can up-to 1M records sometime) is large which requires pagination.
However, duplicated records have been identified (certain record has been duplicate like 10times) and I am trying to understand the consistent behavior during pagination for CompositeAggregation, especially when index is being frequently updated or deleted.
I notice regular indexing rate is at 70TPS which mainly updates on current month index, there's ongoing backfill which introduced extra 70TPS which can be on any random monthly indices).
I have a few questions:
- Does CompositeAggregation generate consistent report during pagination?
- Under the hood, how does after_key work?
- I know there's Scan search or Scroll search. Scan search won't fit for our use case since it's our data needed to be sorted. In case we favor consistent behavior of reporting, would Scroll API be a better option? what's the pro/cons between Scroll vs CompositeAggregation?