Inconsistent Results/Doc Scores on Query

Karl_Kevilus · September 3, 2020, 3:58pm

I'm running the same query back to back on an index and the score and results come back different when I look at explain:

                                                       {
                                                           “value”: 22037,
                                                           “description”: “n, number of documents containing term”,
                                                           “details”: []
                                                       },
                                                       {
                                                           “value”: 23498,
                                                           “description”: “N, total number of documents with field”,
                                                           “details”: []
                                                       }

vs:

    “details”: [
                                                            {
                                                                “value”: 22395,
                                                                “description”: “n, number of documents containing term”,
                                                                “details”: []
                                                            },
                                                            {
                                                                “value”: 23744,
                                                                “description”: “N, total number of documents with field”,
                                                                “details”: []
                                                            }

What should these represent values? I have ~81k documents containing that field. I tried using an API refresh and the results are still returning mixed.

How can I maintain consistency with this, shouldn't it be caching these to some extent, or is that potentially a part of the problem?

Mark_Harwood · September 3, 2020, 4:14pm

If you have replicas you may be round-robining between them.
Although they may be identical in terms of documents they hold, they reorganise that content via background housekeeping (aka merges) at different times. Merging will purge any deletes and these can change the scores produced. IDF or inverse-document-frequency is a scoring factor that is based on a term's popularity and that popularity count includes deleted docs that have not yet been removed.

How can I maintain consistency
Generally, sending the same users to the same choice of replica helps avoid these subtle differences in scoring. This can be done using the preference parameter and setting it to something like the user's session ID.

Karl_Kevilus · September 8, 2020, 1:53pm

After a week + of inconsistency, we did force the merge.

Doing a forcemerge fixed the problem. Thank you.

/_forcemerge/?only_expunge_deletes=true

Is there a way to easily identify that a _forcemerge may be necessary on an index?

What are the risks in manually forcing this merge?

Mark_Harwood · September 8, 2020, 2:09pm

We only advise force merging if you are adding no more docs to an index. It's an expensive operation - essentially rewriting the index files all over again. If you have ongoing indexing it's constantly merging anyway to help defragment the data.
If you have ongoing changes it's generally better to route users to the same choice of replica to avoid these scoring changes.

system · October 6, 2020, 2:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document score explanation values (maxDocs ?) Elasticsearch	3	921	July 6, 2017
inconsistent document scores using search_type=dfs_query_then_fetch (how do the _score and _explanation.value fields relate?) Elasticsearch	8	826	December 16, 2011
Value of docFreq is wrong (using single shard) Elasticsearch	2	512	July 15, 2019
Different results because of replicas Elasticsearch	8	4824	October 25, 2019
Getting consistent scoring best practices Elasticsearch	1	432	December 9, 2019

Inconsistent Results/Doc Scores on Query

Related topics