Use search or scroll for large dataset which needs aggregations

Hi! I'm new to Elasticsearch and I have a particular use case for which I don't know if I should use a basic search or a scroll search.
I have an index in which I periodically save a copy of JSON documents. Each JSON document features an ID that refers to an external data entity. Documents are inserted every day, but not for every entity's ID systematically.
At a certain date A, I want to retrieve and process the most recent entry for each existing entity ID. In other words, I want to filter documents for which the date is before date A, sort them by descending date, and group by ID for which I want to keep only the first document.

Here are the two solutions:

  • Using search with a collapse clause on the ID, then iterating over pages until I have fetched all the documents corresponding to each ID.
  • Using scroll to fetch all documents whose date is less than date A sorted by descending date, then manually handling the grouping because I can't use collapse when using scroll.

The first solution does not seem suitable since I could deal with very large datasets for which the scroll API seems to be made for.
But scrolling does not really seem convenient given I can have hundreds of versions for each ID, dating years before date A, when I only need the most recent version.

What would be the best approach for this use case?
Thanks for any hints, maybe I'm not headed in the right direction here!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.