Inconsistent results for the same query on an index with 0 replicas

Marten · December 15, 2020, 1:20pm

Hi there,

I have a problem with inconsistent ranking of the results when submitting the same query multiple times.
I'm using ES 5.6.8 on an index with 5 primary shards.
I have set the number of replicas from 1 to 0 to avoid the "deleted documents" issue, but still have the same problem.

It seems that the score for the 1st document lowers with each subsequent search (same query).
The score of the second hit stays the same.
With each search, the score from the 1st hit drops until it is lower than the 2nd hit.
When that happens, the ranking of the results changes and the 1st hit becomes the 2nd one.

During all this time, the data in the index didn't change.

Does anybody know what this can be and how I can fix it?

Thanks,

Marten

warkolm · December 15, 2020, 11:19pm

5.X is EOL, please upgrade ASAP

Can you show some examples of the query and the responses with changing scores?

Marten · January 8, 2021, 3:45pm

Hi Mark,

Sorry for the delay.
This is the query:
{
"query": ▿{
"bool": ▿{
"must": ▿[
▿{
"bool": ▿{
"should": ▿[
▿[
▿{
"multi_match": ▿{
"query": "promotie",
"fields": ▿[
"d_author^1",
"d_body^1",
"d_snippetContent^1"
]
}
}
]
]
}
}
],
"should": ▿[
▿{
"bool": ▿{
"must": ▿{
"match": ▿{
"d_content_type": ▿{
"query": "html",
"boost": 1000
}
}
}
}
},
▿{
"bool": ▿{
"should": ▿[
▿{
"match_phrase": ▿{
"d_author.keyword^100": "promotie"
}
},
▿{
"match_phrase": ▿{
"d_body.keyword^100": "promotie"
}
},
▿{
"match_phrase": ▿{
"d_snippetContent.keyword^100": "promotie"
}
}
]
}
}
],
"must_not": ,
"filter": ▿[
▿{
"bool": ▿{
"must": ▿[
▿[
▿{
"term": ▿{
"d_allow_token_document.keyword": "nosecurity"
}
}
]
]
}
}
]
}
},
"_source": ▿{
"excludes": ▿[
"d_body",
"d_sublinks"
]
},
"highlight": ▿{
"require_field_match": false,
"pre_tags": ▿[
""
],
"post_tags": ▿[
""
],
"fields": ▿{
"d_body": ▿{
"type": "plain",
"fragment_size": 200,
"number_of_fragments": 2,
"highlight_query": ▿{
"match_phrase": ▿{
"d_body": "promotie"
}
}
},
"d_title": ▿{
"type": "plain",
"fragment_size": 200,
"number_of_fragments": 2,
"highlight_query": ▿{
"match_phrase": ▿{
"d_title": "promotie"
}
}
}
}
},
"suggest": ▿{
"text": "promotie",
"phrase_suggestion": ▿{
"phrase": ▿{
"field": "d_title.trigram",
"direct_generator": ▿[
▿{
"field": "d_title.trigram",
"suggest_mode": "always",
"prefix_length": 0
}
]
}
}
}
}

(sorry for the layout)

Marten · January 8, 2021, 3:53pm

Examples of the response:
1st search:
▿{
"took": 37,
"timed_out": false,
"_shards": ▿{
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": ▿{
"total": 6506,
"max_score": 194.98152,
"hits": ▿[
▿{
"_index": "rugsearchv22",
"_type": "rugsearchv22",
"_id": "https://www.rug.nl/staff/s.s.m.peters/projects",
"_score": 194.98152,
"_source": ▿{ ....

2nd search:
▿{
"took": 33,
"timed_out": false,
"_shards": ▿{
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": ▿{
"total": 6506,
"max_score": 194.77722,
"hits": ▿[
▿{
"_index": "rugsearchv22",
"_type": "rugsearchv22",
"_id": "https://www.rug.nl/staff/s.s.m.peters/projects",
"_score": 194.77722,
"_source": ▿{ ....

3rd search:
▿{
"took": 69,
"timed_out": false,
"_shards": ▿{
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": ▿{
"total": 6506,
"max_score": 194.68721,
"hits": ▿[
▿{
"_index": "rugsearchv22",
"_type": "rugsearchv22",
"_id": "https://www.rug.nl/staff/s.s.m.peters/projects",
"_score": 194.68721,
"_source": ▿{ ....

4th search:
▿{
"took": 25,
"timed_out": false,
"_shards": ▿{
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": ▿{
"total": 6506,
"max_score": 194.64012,
"hits": ▿[
▿{
"_index": "rugsearchv22",
"_type": "rugsearchv22",
"_id": "https://www.rug.nl/staff/s.s.m.peters/projects",
"_score": 194.64012,
"_source": ▿{ ....

At a certain moment the score of the 1st hit drops below the second hit and the results are shown in a different order.

Data hasn't changed in the meantime.

Best Regards,

Marten

myspacebarisbroken · January 9, 2021, 6:41am

Try running a explain query to get some details on why it's scoring it the way it is.

Something like this:

curl -XGET 'https://localhost:9200/rugsearchv22/rugsearchv22/<somedocid>/_explain?pretty=1' -d '  
{
   "query" : {
      "match" : {
         "title" : "life"
      }
   }
}

I'm not sure whats up in your _id field, and why it's a URL, but this should help give some insight on why it's scoring that document the way it is.

Read more on scoring here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/recipes.html#consistent-scoring

That link and the paragraph directly below it https://www.elastic.co/guide/en/elasticsearch/reference/5.6/recipes.html#_relevancy_looks_wrong explains what's happening.

To put it simply, scores are calculated according to each shards statistics. Which can change.

There are two fixes for this:

(Not optimal): Use only a single shard.
(Better Solution) Use [ dfs_query_then_fetch ] (https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-search-type.html#dfs-query-then-fetch)

This is because ES will perform an initial "trip" to involve all shards, gathering index statistics and what not relative to the query, then the node will merge those statistics and send a merged statistic alongside the request when asking each shard to perform the query phase. That way, all the shards use a single global statistic, rather than their own, thus producing consistent scoring.

warkolm · January 11, 2021, 12:00am

Please format your code/logs/config using the </> button, or markdown style back ticks. It helps to make things easy to read which helps us help you

Marten · January 11, 2021, 4:28pm

@warkolm, I tried that but I didn't get it right, will try again next time.
@myspacebarisbroken, I didn't look at the explain but I will try that.
The reason I didn't look at explain is that the data didn't change and I have zero replica's, so it's always accessing the same shards when issuing the same query.

I will try to find a reason in the explain

Thanks so far anyway.

Marten

system · February 8, 2021, 4:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Different scores on replicas with the same documents Elasticsearch	6	2169	July 6, 2017
Intermittent scoring returned Elasticsearch	3	264	July 6, 2017
Inconsistent index ordering scores between replicas Elasticsearch	1	436	April 13, 2018
Primary vs replica inconsistency in ES5.6 Elasticsearch	4	592	October 28, 2020
Inconsistent sort order on identical queries Elasticsearch	1	857	July 5, 2017

Inconsistent results for the same query on an index with 0 replicas

Related topics