Hi,
I have a Java job using scroll to export large amount of data from ES.
I have the problem that every time i run the job with the same input params (time_start and time_end) i get slightly different number of rows.
It can be also my code, but so far i do not see a problem there.
So i am thinking.. can it be some sort of score issue?
My query is basically a a bool with range and terms
The difference is not big, but still i want to know what is going on
For example here are 6 runs (the first number is number of lines):
It's possible that the different scrolls are hitting different shard copies which have ended up with different sets of documents. This is something that we found and fixed in 6.3.0. What version are we talking about here?
Also, how many replicas are there in this index?
Does the discrepancy go away if you add a custom search preference such as ?preference= xyzabc123?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.