Use search or scroll for large dataset which needs aggregations

nboisnea1 · November 6, 2023, 10:45pm

Hi! I'm new to Elasticsearch and I have a particular use case for which I don't know if I should use a basic search or a scroll search.
I have an index in which I periodically save a copy of JSON documents. Each JSON document features an ID that refers to an external data entity. Documents are inserted every day, but not for every entity's ID systematically.
At a certain date A, I want to retrieve and process the most recent entry for each existing entity ID. In other words, I want to filter documents for which the date is before date A, sort them by descending date, and group by ID for which I want to keep only the first document.

Here are the two solutions:

Using search with a collapse clause on the ID, then iterating over pages until I have fetched all the documents corresponding to each ID.
Using scroll to fetch all documents whose date is less than date A sorted by descending date, then manually handling the grouping because I can't use collapse when using scroll.

The first solution does not seem suitable since I could deal with very large datasets for which the scroll API seems to be made for.
But scrolling does not really seem convenient given I can have hundreds of versions for each ID, dating years before date A, when I only need the most recent version.

What would be the best approach for this use case?
Thanks for any hints, maybe I'm not headed in the right direction here!

system · December 4, 2023, 10:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query by max date Elasticsearch	1	2362	November 5, 2020
Query on aggregation and scroll Elasticsearch	6	3231	January 10, 2019
Filtering data based on date using data frames Elasticsearch	1	352	October 14, 2019
If scrolling isn't recommended for user requests, what to use? Elasticsearch	2	779	July 5, 2017
Retrieving millions of large documents Elasticsearch	7	429	September 25, 2023

Use search or scroll for large dataset which needs aggregations

Related topics