We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60. What we see is that the result is inconsistent. Meaning, on
the 2nd page, we sometimes get results that were before in the 1st page.
The query we use has an order by some numberic field that has many
documents with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.
Did anyone encounter such behavior? Any suggestions on resolving this?
We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of 60 and we have 63 documents. So the first page is using size 60 and offset 0. The second page is using size 60 and offset 60. What we see is that the result is inconsistent. Meaning, on the 2nd page, we sometimes get results that were before in the 1st page.
The query we use has an order by some numberic field that has many documents with the same value (0).
It looks like the ordering between documents according to the same value, which is 0, isn't consistent.
Did anyone encounter such behavior? Any suggestions on resolving this?
The cause of this issue is that Elasticsearch uses Lucene's internal doc
IDs as tie-breakers. Internal doc IDs might be completely different across
replicas of the same data, so this explains why documents that have the
same sort values are not consistently ordered.
There are 2 potential ways to fix that problem:
Use scroll as David mentionned. It will create a context around your
request and will make sure that the same shards will be used for all pages.
However, it also gives another warranty, which is that the same
point-in-time view on the index will be used for each page, and this is
expensive to maintain.
Use a custom string value as a preference in order to always hit the
same shards for a given session[1]. This will help with always hitting the
same shards likely to 1. but without adding the additional cost of a scroll.
We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60. What we see is that the result is inconsistent. Meaning,
on the 2nd page, we sometimes get results that were before in the 1st page.
The query we use has an order by some numberic field that has many
documents with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.
Did anyone encounter such behavior? Any suggestions on resolving this?
Thanks for the answer and sorry for the duplicate (posted from a different
source by mistake)
On Monday, August 18, 2014 11:02:47 AM UTC+3, Adrien Grand wrote:
Hi Ron,
The cause of this issue is that Elasticsearch uses Lucene's internal doc
IDs as tie-breakers. Internal doc IDs might be completely different
across replicas of the same data, so this explains why documents that have
the same sort values are not consistently ordered.
There are 2 potential ways to fix that problem:
Use scroll as David mentionned. It will create a context around your
request and will make sure that the same shards will be used for all pages.
However, it also gives another warranty, which is that the same
point-in-time view on the index will be used for each page, and this is
expensive to maintain.
Use a custom string value as a preference in order to always hit the
same shards for a given session[1]. This will help with always hitting
the same shards likely to 1. but without adding the additional cost of a
scroll.
On Mon, Aug 18, 2014 at 8:02 AM, Ron Sher <ron....@gmail.com <javascript:>
wrote:
Hi,
We've noticed a strange behavior in elasticsearch during paging.
In one case we use a paging size of 60 and we have 63 documents. So the
first page is using size 60 and offset 0. The second page is using size 60
and offset 60. What we see is that the result is inconsistent. Meaning,
on the 2nd page, we sometimes get results that were before in the 1st page.
The query we use has an order by some numberic field that has many
documents with the same value (0).
It looks like the ordering between documents according to the same value,
which is 0, isn't consistent.
Did anyone encounter such behavior? Any suggestions on resolving this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.