Different results because of replicas


my Elasticsearch 6.x Index has 1 shard and 2 replicas with ~20k documents.
When I make 2 consecutive requests with exactly same parameters, I get same number of hits with slightly differently scored results, i.e.:

  • max_score is different (5.1018233 vs 5.106115)
  • one specific document is on the top of the results when request is sent once, but same document is scored so it is fourth in the result list after other request

Every time I made request, result will bounce between one of these 2 variants.

For testing purposes, I added "preference=_primary" & "preference=_replica" to the endpoint and noticed that I get same two different results, depending of preference value. Querying "_primary" & "_replica" for number of documents confirmed that both has the same number of documents.

Anybody can explain what/why is this happening?
Anybody can help me to get consistent results despite having more than one replica?

Housekeeping merge operations happen independently on shards.

  • The number of deleted documents will therefore vary between shards.
  • Deleted documents are included in the total number of documents included in the Lucene index that is a shard.
  • The total number of documents in the Lucene index is used in IDF (inverse document frequency) scoring which means replicas can score the same query on the same docs slightly differently depending on merge activity,

This is one of the reasons the preference setting can take something like a user session ID as a value to ensure each user returns to the same choice of replica.
Of course, ongoing indexing activity and merging can change matters even on the same replica so users shouldn't expect to have a completely stable view.

The issue here is that user searches for document by its title and get it on the top of results, but if user search for it again, it might be 4th in the results list !?!?

I would not have an issue if scoring is slightly different & reduced/increased for all projects in a same way, but proposed giving users session ID as preference parameter in practise would mean that some users will get correct scoring and some not (they will get worse result list cause document will not be on the top as expected)... or I got it wrong?

p.s. I have 1 shard, but 2 replicas. Do replicas works like shards? I assumed housekeeping would happen on primary and then eventually it will be replicated to replica? Was that wrong assumption?

It's the way it is because of a number of necessary trade-offs.

What's your definition of "correct"? Presumably, that would be stats based on your indexing having no deletes at all (a "force merged" index). Maintaining that isn't practical so we have to work with imperfect stats for fast results.

Yes. Under ordinary indexing activity JSON documents are replicated not their indexed form (segment files). Each replica does its own indexing of that JSON content into segment files.
If you design a system around replicating segment files there's a tension - you'd want segment files to be small to minimise the content transferred over networks but you'd want segment files to be merged into large ones to make searches efficient.

If ordering is critical then among user groups maybe they can share the same session ID used for preference routing? That, or the query is redefined to be less sensitive to IDF ranking.

If for given search parameters, Elasticsearch with no replicas was able to score the document to the top of the result list, let's call that "correct" reference. It is the best ES could do with that index configuration and given search parameters.

For me is very surprising that because I decided to have additional replica, sometimes I don't get that "correct" result anymore (despite knowing that ES could score specific document better with specific index configuration) and instead there is some other results that user would rate as "worse".

Feels like one needs to decide between consistency of good results and having more replicas. Sorry if I sound too critical, but I still don't have solution for the issue

Perfect. So, at the end of indexing of that JSON content, segmented files in _replica are same as in _primary? I really hope indexing is consistent and happening in the same way on _primary and _replica

"Correct" can vary from second to second if indexing and merging activity is ongoing.
Even on a no-replica system the number one result can move due to other document changes. These changes might be completely unrelated to the documents matching your search. That's part of the user experience - they can't assume they have a stable view of the data if changes are happening.

No. The set of documents should be the same - the way they are physically organised is different.
There may be different search loads or other performance concerns which causes background housekeeping merge activity to take longer which means the merge scheduling operations on each replica will diverge over time. As I explained previously, the merging of deletes will change the IDF scores.

If your content is not undergoing change then a "force merge" operation should make replicas rank the same (and it's only when you expect an index to have no more changes that it's appropriate to pay the heavy cost of a "force merge").
In a live index with ongoing changes it's typically too heavy a cost to maintain an identical view of relevance ranking across all replicas. Relevance scores are always changing anyway.

If you assume you wanted relevance ranking and needed that to be stable across a changing index then you'd essentially have to know all possible search terms in advance (aardvark to zit) and assign each of them a weighted score (so that aardvark matches score more highly than the matches). You'd then need to write searches that used these pre-determined boosts for each search term. This is impractical, of course.

Ok, I understand now slightly better. Can you recommend me some resources to learn more about this so I can make better decisions in future? Thanks for all clarifications

Here you go:

  • This old article has some background to the DFS search mode. That feature is designed to overcome any scoring bias across different shards. However it won't help you solve your problem which is scoring differences between replicas of the same shard.

  • For a detailed discussion on what might be the pros and cons of segment based replication (copying indexed docs not raw docs) then see this thread

  • As you've already discovered, the preference parameter can help route users to the same choice of replica to try and give some stability (but ongoing indexing will carry on changing order).

  • For a deep dive on the data structures in the indexes see this talk

  • Mike McCandless's [animation of Lucene segment merging].
    (Changing Bits: Visualizing Lucene's segment merges) is an interesting insight.

All-in-all distributed search on an actively changing dataset is challenging and requires a number of trade-offs.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.