we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.
Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.
My question is: Assuming that the documents on the replicas are identical,
can you think of any circumstances in which this kind of different scoring
can happen? Or is the only explanation for this that replication somehow
failed and the document content is not the same on all nodes?
I think this should give you some idea why this happens
simon
On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:
Hi all,
we are running Elasticsearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.
Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.
My question is: Assuming that the documents on the replicas are identical,
can you think of any circumstances in which this kind of different scoring
can happen? Or is the only explanation for this that replication somehow
failed and the document content is not the same on all nodes?
I think this should give you some idea why this happens
simon
On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:
Hi all,
we are running Elasticsearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.
Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.
My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?
Thanks for the pointer, I'm still not sure I understand the implications on
score calculation in my case. I'll post a reply to the discussion in the
GitHub issue you cited.
On Tuesday, August 27, 2013 12:40:05 PM UTC+2, simonw wrote:
I think this should give you some idea why this happens
simon
On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:
Hi all,
we are running Elasticsearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.
Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.
My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?
Thanks RKM for the idea, but I don't think the "search type" you mention
applies in my case. In my situation, the search is not "distributed" in the
sense that multiple nodes work together on the same request. It is just
that the search request is routed to different replicas for load balancing,
but each search query is handled by a single replica only.
On Wednesday, August 28, 2013 1:59:02 AM UTC+2, RKM wrote:
Look at the query explanation to see what differs.
I think this should give you some idea why this happens
simon
On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:
Hi all,
we are running Elasticsearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.
Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the
score returned always is one of these four.
My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?
Thanks for any ideas,
Patrick
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
That is why I mentioned that you should look into the explanation. It will
tell you what the statistic is that lucene used to calculate that score.
Often it will be term frequency, for example. Unless YOU EXPRESSLY STATE
OTHERWISE VIA THE QUERY TYPE you get a per-shard term frequency which is
different from the actual term frequency which is only correct when
calculated across all shards.
Thanks RKM for the idea, but I don't think the "search type" you mention
applies in my case. In my situation, the search is not "distributed" in the
sense that multiple nodes work together on the same request. It is just
that the search request is routed to different replicas for load balancing,
but each search query is handled by a single replica only.
On Wednesday, August 28, 2013 1:59:02 AM UTC+2, RKM wrote:
Look at the query explanation to see what differs.
Most likely, google DFS_QUERY_THEN_FETCH.
On Tue, Aug 27, 2013 at 3:40 AM, simonw <simon.w...@**elasticsearch.com>wrote:
I think this should give you some idea why this happens
simon
On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:
Hi all,
we are running Elasticsearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.
Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the
score returned always is one of these four.
My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?
Thanks for any ideas,
Patrick
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.