I use a scroll query issued via Nest to iterate through a large number of documents in an index, page by page. The batch size size is set to 100, therefore each scroll query page in turn returns (N * batch_size) documents, where N is the number of shards in the index.
If the index had 10 shards, there will be 1000 documents returned by each scroll call for all pages until the end of the process when the scroll returns fewer documents per call as some of the shards do not return results anymore.
However on some occasions when this enumeration happens, the Scroll calls start returning fewer documents relatively early in the process. For example it would abruptly start returning 800 documents instead of 1000. No scroll error is returned for any of the shards in the search response object, no shards are marked as failed. The response to the first call that returns fewer documents specifies indeed that the number of successful shards (e.g. 8) is fewer than the total number of shards (e.g. 10), but the number of failed shards is 0. In subsequent Scroll calls the API keeps returning the lower number of documents (e.g. 800) until the scroll completes, and the total count of documents processed is lower than the expected count.
This looks as if a few shards have become temporarily unavailable, possibly because or a relocation, but were not picked up again by the scroll queries. I wasn't able to catch this when it happens to correlate to a shard reallocation, but the index and its replicas are healthy by the time I check.
Note that towards the end of the scroll, as the last pages return fewer documents, as expected, there's no difference in the scroll response between the total number of shards involved in the query and the successful number of shards. This only happens once, when the result count drops from 1000 to 800. Again, there's no Scroll error through any of this process and it doesn't happen every time I run it.
Any idea how to diagnose or fix this?
Known bug fixed long ago (we are on an older version of ES, 1.7.1)?
Nest Client API issue? (I'm using Nest 1.9.2)?
Am I missing something?