Bulk Data using Scan & Scroll API

code_blue · September 14, 2015, 11:42am

We are trying to fetch all documents inside a mapping using Scan & Scroll API in batches.

Is it possible same document is being sent multiple times or to put it in other words duplicate records are being returning while fetching data using this API.

Mark_Harwood · September 14, 2015, 12:08pm

You shouldn't get duplicates under normal circumstances.
Part of the contract in this API is that it preserves a "point in time" view of the data for your use and keeps you from seeing subsequent updates - but only if you return the scroll ID which it provided you as the context.

Is it possible you are failing to provide the ID in subsequent calls?

code_blue · September 14, 2015, 12:16pm

So every time we should use the new scroll id sent in the previous response?

Mark_Harwood · September 14, 2015, 12:19pm

Yep:

The initial search request and each subsequent scroll request returns a new _scroll_id — only the most recent _scroll_id should be used.

Topic		Replies	Views
ES Scroll is returning duplicate/same results Elasticsearch docker	8	551	June 25, 2021
Do unique/reusable _scroll_ids exist? Elasticsearch	4	1511	July 6, 2017
Scroll api or search_after giving duplicate records Elasticsearch	5	2098	March 13, 2020
Scroll API with slice gives duplicated entry Elasticsearch	1	631	September 18, 2018
Different scroll id for each subsequent scroll request? Elasticsearch	7	517	May 25, 2020

Bulk Data using Scan & Scroll API

Related topics