Incomplete results for scan / scroll searches

I have a little (Java) based test playing with the scan / scroll API. It is
basically working but I am always missing a single element in the last
scroll response. This is what I am doing conceptually:

  1. Index 1000 documents and run a index refresh (index has default of 5
    shards)
  2. Run 'matchAll' search with search type = 'scan', size=100, timeout of
    30 seconds
  3. Run a loop of scroll request with the previous scroll id until I get
    no more hits

I found that the number of total hits is correctly reported as 1000. The
first scroll response carries the expected 500 hits (number of shards *
100). However the second (last) scroll response only has 499 hits. It seems
that the last document is missing. Has someone observed similar issues ?

Update

The problem was most likely on my side. I expected each scroll to have 500
elements. What I actually get for the 1000 elements is 3 responses: 1 = 500
hits, 2 = 499 hits, 3 = 1 hit. I get all 1000 elements so there seems to be
no bug. I still find the sizes a little strange ... maybe someone can shed
some light on this.

Hi Jan

The problem was most likely on my side. I expected each scroll to have
500 elements. What I actually get for the 1000 elements is 3
responses: 1 = 500 hits, 2 = 499 hits, 3 = 1 hit. I get all 1000
elements so there seems to be no bug. I still find the sizes a little
strange ... maybe someone can shed some light on this.

If you request $size results when scanning, it gives you a maximum of
$size results from each shard.

So if $size == 10, and you have 5 shards, you could get a maximum of 50
results. The actual number will vary, depending on which shards contain
the documents.

For example, if you have a total of 25 documents, on two shards, but 20
of them are on shard 1, then you would get:

  • first request: 15 results:
    • 10 from shard 1
    • 5 from shard 2
  • second request: 10 results
    • 10 from shard 1

clint