I have a little (Java) based test playing with the scan / scroll API. It is
basically working but I am always missing a single element in the last
scroll response. This is what I am doing conceptually:
- Index 1000 documents and run a index refresh (index has default of 5
shards)
- Run 'matchAll' search with search type = 'scan', size=100, timeout of
30 seconds
- Run a loop of scroll request with the previous scroll id until I get
no more hits
I found that the number of total hits is correctly reported as 1000. The
first scroll response carries the expected 500 hits (number of shards *
100). However the second (last) scroll response only has 499 hits. It seems
that the last document is missing. Has someone observed similar issues ?
Update
The problem was most likely on my side. I expected each scroll to have 500
elements. What I actually get for the 1000 elements is 3 responses: 1 = 500
hits, 2 = 499 hits, 3 = 1 hit. I get all 1000 elements so there seems to be
no bug. I still find the sizes a little strange ... maybe someone can shed
some light on this.
Hi Jan
The problem was most likely on my side. I expected each scroll to have
500 elements. What I actually get for the 1000 elements is 3
responses: 1 = 500 hits, 2 = 499 hits, 3 = 1 hit. I get all 1000
elements so there seems to be no bug. I still find the sizes a little
strange ... maybe someone can shed some light on this.
If you request $size results when scanning, it gives you a maximum of
$size results from each shard.
So if $size == 10, and you have 5 shards, you could get a maximum of 50
results. The actual number will vary, depending on which shards contain
the documents.
For example, if you have a total of 25 documents, on two shards, but 20
of them are on shard 1, then you would get:
- first request: 15 results:
- 10 from shard 1
- 5 from shard 2
- second request: 10 results
clint