Small difference of number of results with scroll

Hi,
I have a Java job using scroll to export large amount of data from ES.
I have the problem that every time i run the job with the same input params (time_start and time_end) i get slightly different number of rows.

It can be also my code, but so far i do not see a problem there.
So i am thinking.. can it be some sort of score issue?

My query is basically a a bool with range and terms

The difference is not big, but still i want to know what is going on

For example here are 6 runs (the first number is number of lines):

96935 /data/tmp/test_20190415_20190416.csv0
96935 /data/tmp/test_20190415_20190416.csv1
96931 /data/tmp/test_20190415_20190416.csv2
96931 /data/tmp/test_20190415_20190416.csv3
96931 /data/tmp/test_20190415_20190416.csv4
96931 /data/tmp/test_20190415_20190416.csv5
96935 /data/tmp/test_20190415_20190416.csv6

Is your index changing or a static one?

If no index operation is happening and your query is correct and no bug on your side, it should be consistent.

It's possible that the different scrolls are hitting different shard copies which have ended up with different sets of documents. This is something that we found and fixed in 6.3.0. What version are we talking about here?

Also, how many replicas are there in this index?

Does the discrepancy go away if you add a custom search preference such as ?preference= xyzabc123?

Thank you for the answers.
I am running version 6.7

I have 2 replicas configured.
And the data is static, this is exactly why i am digging in this.

Will give custom preference a try, once i find how do i set in Java

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.