Document score explanation values (maxDocs ?)

SpinBoldak · June 20, 2015, 3:06pm

Hi,

I'm querying some documents with two sort attributes (default score and one of the document attributes). I'm getting different scores for these docs and I cannot figure out why. Using the explain=true attribute, I'm able to see some different values, even though all documents look the same. For example, I'm querying for "1090", all documents have an id of format "xxxxxxxxxx.1090.xx", an attribute "ContractNumber: 1090" and one attribute "ElasticsearchKey" that contains the documents id. Besides those 3 matches, I cannot see any other "1090" in any other attribute. So, I was expecting to get the same score values, but there''s some score variations that I cannot understand.

With the query (the original query is not this one, this is just a "reduced" form to debug/reproduce the behaviour):

{
  "query": {
    "bool": {
      "must": {
        "query_string": {
          "query": "1090"
        }
      }
    }
  }
}

The results (6 documents) have different scores. On their "_explanation" attribute, we have different descriptions:

{ "description": "weight(_all:1090 in 26362) [PerFieldSimilarity], result of:"
"description": "weight(_all:1090 in 11206) [PerFieldSimilarity], result of:"
"description": "weight(_all:1090 in 67244) [PerFieldSimilarity], result of:"
[...]

What are those "in xxx" values ? According to the details attribute, it seems to be a "MaxDocs" attribute, but how's that calculated ? There's also other degrees of variation, some have 2 "details" attributes, others have more. Here's the full explanation responses, if anyone would care to see it: { "took": 3, "timed_out": false, "_shards": { "total": 5, "su - Pastebin.com (I've removed the other documents attributes, for clean-ness).

Any hints ?

Thank you

Ivan · June 22, 2015, 5:53am

The problem you are experiencing is due to distributed search. The IDF
values are calculated per shard, so scores can change depending on which
shard the document is located on. If you notice, the documents with the
same score are all on the same shard.

This problem normally manifests when you have a low number of documents and
a few or more shards. If you had millions of documents the problem will be
less.

One option is to use a distributed query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

There is a slight performance but, but it should help with the problem.

Cheers,

Ivan

SpinBoldak · June 22, 2015, 5:19pm

Ok, that makes sense. We'll evaluate the scenario & solutions.

Thanks.

Topic		Replies	Views
Why does IDF differs on hits with same query? Elasticsearch	4	1368	July 5, 2017
Inconsistent Results/Doc Scores on Query Elasticsearch	4	1047	October 6, 2020
Odd scoring behavior Elasticsearch	7	500	March 22, 2018
Why elasticsearch gives different scores to identical documents Elasticsearch	2	586	June 26, 2018
Unexpected Document Scoring Elasticsearch	2	229	July 6, 2017

Document score explanation values (maxDocs ?)

Related topics