Scroll returns inconsistent number of results

We are using ElasticSearch 5.4.0 and noticed a strange behavior of the scroll API on a very specific index.
The behavior we are seeing is that when scrolling with a simple match_all query, the number of returned results is sometimes different than the "total" returned by ES. Sometimes it returns the exactly expected number, but sometimes it returns less.

It is happening on a specific index, that is no longer active (i.e. no new data is written to it) - we haven't seen this problem on any other index and in fact reindexing the data to a new index solves the problem because the new index doesn't exhibit the same symptoms when scrolling it.

Below is an example that demonstrates the problem (the script logic is pretty simple and of course takes into account the fact that the first response also contains hits). This is the first time we see such behavior (and we do a lot of scan&scroll queries), and I haven't seen any related issue or bug that could potentially explain it.

Any tips on how to debug this to get further information on what's going on are most welcome...

[user@localhost]/tmp> python /tmp/test_scroll.py
Expected: 240773
Total hits so far: 20000
Total hits so far: 30000
Total hits so far: 40000
Total hits so far: 50000
Total hits so far: 60000
Total hits so far: 70000
Total hits so far: 80000
Total hits so far: 90000
Total hits so far: 100000
Total hits so far: 110000
Total hits so far: 120000
Total hits so far: 130000
Total hits so far: 140000
Total hits so far: 150000
Total hits so far: 160000
Total hits so far: 170000
Total hits so far: 180000
Total hits so far: 190000
Total hits so far: 200000
Total hits so far: 210000
Total hits so far: 220000
Total hits so far: 230000
Total hits so far: 233032
Got empty hits array!
Expected: 240773
Received: 233032

Running it again five seconds later:

[user@localhost]/tmp> python /tmp/test_scroll.py
Expected: 240773
Total hits so far: 20000
Total hits so far: 30000
Total hits so far: 40000
Total hits so far: 50000
Total hits so far: 60000
Total hits so far: 70000
Total hits so far: 80000
Total hits so far: 90000
Total hits so far: 100000
Total hits so far: 110000
Total hits so far: 120000
Total hits so far: 130000
Total hits so far: 140000
Total hits so far: 150000
Total hits so far: 160000
Total hits so far: 170000
Total hits so far: 180000
Total hits so far: 190000
Total hits so far: 200000
Total hits so far: 210000
Total hits so far: 220000
Total hits so far: 230000
Total hits so far: 240000
Total hits so far: 240773
Got empty hits array!
Expected: 240773
Received: 240773

Do you have any shards where the document count differs between the primary and replica?

Based on the output of _stats/?level=shard, there are no such shards.
All primary and replica shards have the same number of documents (based on docs.count). There are however differences in the number of deleted documents between a primary and the replica (where primary has 0 deleted documents but replica has some), but not sure it's related.

It's also worth mentioning that the number of returned results is always the same - it is either 240773 (which is the correct number) or 233032, never any other number.

Any additional insights? I'm still seeing the same issue on the same index (and only on that index)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.