Bulk Data using Scan & Scroll API


#1

We are trying to fetch all documents inside a mapping using Scan & Scroll API in batches.

Is it possible same document is being sent multiple times or to put it in other words duplicate records are being returning while fetching data using this API.


(Mark Harwood) #2

You shouldn't get duplicates under normal circumstances.
Part of the contract in this API is that it preserves a "point in time" view of the data for your use and keeps you from seeing subsequent updates - but only if you return the scroll ID which it provided you as the context.

Is it possible you are failing to provide the ID in subsequent calls?


#3

So every time we should use the new scroll id sent in the previous response?


(Mark Harwood) #4

Yep:

The initial search request and each subsequent scroll request returns a new _scroll_id — only the most recent _scroll_id should be used.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-search-context


(system) #5