Basically, I am using scan-and-scroll to pull some metadata from documents (fields which are light-weight E.g., Name, Hash, RegistrationNumber and these fields are mapped as 'stored = true').
Scenario: say while fetching result, the client got crashed/disconnect due to any valid reason or before processing the set of result application crashed. Now, I couldn't find any way to re-fetch the documents which were fetched from last call but couldn't reach the user/processed.
-> Given that scroll-id remains same throughout the scrolling, there is no option to fall back to last set fetched.
-> Impact: some of the documents silently go missing without any fall back. Especially when most of the set is fetched and only few sets are left (say 4Million docs are fetched out of 4.1 million).
-> It would have been great if user can fetch at-least the last set being tried to fetch, that way there will be a surety if last batch reached the user successfully or we should move on. Before validating this behavior, I was expecting a unique scroll-id every time which I can pass to ElasticSearch to confirm response was fetched successfully over the wire and now I am ready for next set of response. And in turn either ES will entertain me with past requested data, or will throw an exception for not having the right token as it was already been used.
One more question, does it impact the time to scroll if the document's size is large? Even though the fields requested are very light weight? I am seeing a performance degradation for same fields fetched from two mappings, out of which one has a bit extra heavy fields (which are not tried to fetch).