Different scores on replicas with the same documents

Hi all,

we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.

Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.

My question is: Assuming that the documents on the replicas are identical,
can you think of any circumstances in which this kind of different scoring
can happen? Or is the only explanation for this that replication somehow
failed and the document content is not the same on all nodes?

Thanks for any ideas,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I just answered something like this
here: https://github.com/elasticsearch/elasticsearch/issues/3578

I think this should give you some idea why this happens :slight_smile:

simon

On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:

Hi all,

we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.

Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.

My question is: Assuming that the documents on the replicas are identical,
can you think of any circumstances in which this kind of different scoring
can happen? Or is the only explanation for this that replication somehow
failed and the document content is not the same on all nodes?

Thanks for any ideas,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Look at the query explanation to see what differs.

Most likely, google DFS_QUERY_THEN_FETCH.

On Tue, Aug 27, 2013 at 3:40 AM, simonw
simon.willnauer@elasticsearch.comwrote:

I just answered something like this here:
https://github.com/elasticsearch/elasticsearch/issues/3578

I think this should give you some idea why this happens :slight_smile:

simon

On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:

Hi all,

we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.

Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.

My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?

Thanks for any ideas,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the pointer, I'm still not sure I understand the implications on
score calculation in my case. I'll post a reply to the discussion in the
GitHub issue you cited.

On Tuesday, August 27, 2013 12:40:05 PM UTC+2, simonw wrote:

I just answered something like this here:
https://github.com/elasticsearch/elasticsearch/issues/3578

I think this should give you some idea why this happens :slight_smile:

simon

On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:

Hi all,

we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.

Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the score
returned always is one of these four.

My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?

Thanks for any ideas,
Patrick

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks RKM for the idea, but I don't think the "search type" you mention
applies in my case. In my situation, the search is not "distributed" in the
sense that multiple nodes work together on the same request. It is just
that the search request is routed to different replicas for load balancing,
but each search query is handled by a single replica only.

On Wednesday, August 28, 2013 1:59:02 AM UTC+2, RKM wrote:

Look at the query explanation to see what differs.

Most likely, google DFS_QUERY_THEN_FETCH.

On Tue, Aug 27, 2013 at 3:40 AM, simonw <simon.w...@elasticsearch.com<javascript:>

wrote:

I just answered something like this here:
https://github.com/elasticsearch/elasticsearch/issues/3578

I think this should give you some idea why this happens :slight_smile:

simon

On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:

Hi all,

we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.

Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the
score returned always is one of these four.

My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?

Thanks for any ideas,
Patrick

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

That is why I mentioned that you should look into the explanation. It will
tell you what the statistic is that lucene used to calculate that score.
Often it will be term frequency, for example. Unless YOU EXPRESSLY STATE
OTHERWISE VIA THE QUERY TYPE you get a per-shard term frequency which is
different from the actual term frequency which is only correct when
calculated across all shards.

So tell us what you figure out.

On Wed, Aug 28, 2013 at 1:43 AM, Patrick Peschlow <
patrick.peschlow@codecentric.de> wrote:

Thanks RKM for the idea, but I don't think the "search type" you mention
applies in my case. In my situation, the search is not "distributed" in the
sense that multiple nodes work together on the same request. It is just
that the search request is routed to different replicas for load balancing,
but each search query is handled by a single replica only.

On Wednesday, August 28, 2013 1:59:02 AM UTC+2, RKM wrote:

Look at the query explanation to see what differs.

Most likely, google DFS_QUERY_THEN_FETCH.

On Tue, Aug 27, 2013 at 3:40 AM, simonw <simon.w...@**elasticsearch.com>wrote:

I just answered something like this here: https://github.com/**
elasticsearch/elasticsearch/**issues/3578https://github.com/elasticsearch/elasticsearch/issues/3578

I think this should give you some idea why this happens :slight_smile:

simon

On Tuesday, August 27, 2013 9:34:25 AM UTC+2, Patrick Peschlow wrote:

Hi all,

we are running ElasticSearch with one primary shard and three replicas.
Search queries are randomly distributed to these replicas (i.e., the
default - we don't have any "preference" set). All replicas have the same
number of documents.

Now if we execute the same search query four times, in many cases we
receive four different results where the scores differ (and, because of
that, the order of hits returned sometimes also changes). It seems that
each replica computes different document scores. One example of the _score
field of the same hit in four search requests: 11.18703, 11.106314,
11.079036, 10.929455. If we continue to send the same request, the
score returned always is one of these four.

My question is: Assuming that the documents on the replicas are
identical, can you think of any circumstances in which this kind of
different scoring can happen? Or is the only explanation for this that
replication somehow failed and the document content is not the same on all
nodes?

Thanks for any ideas,
Patrick

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.