Different results from two clusters on different data center

Youxu · October 21, 2015, 4:29pm

We deployed our ES cluster on two Azure data centers, and the two cluster contains exactly same index schema and documents.

The issue we found is, when querying same terms, the results from these two cluster are slightly different, event we set the primary shard to 1.

Is this expected ES behavior? What should we do to force the two cluster return exactly same results?

dadoonet · October 21, 2015, 5:04pm

How is it different?

Any sample?

Youxu · October 21, 2015, 5:59pm

For a certain query, the top 2 documents from two clusters are:

    "total": 26,
    "max_score": 4.186072,
    "hits": [
        {
            "_score": 4.186072,
            "_type": "ja-jp",                
            "_source": {
                "DocumentId": "1041_1",
                "Language": "Japanese (Japan)"
            }
        },
        {
            "_score": 3.3488574,
            "_type": "ja-jp",                   
            "_source": {
                "DocumentId": "1041_2",
                "Language": "Japanese (Japan)",
            }
        }

And

    "total": 26,
    "max_score": 4.186072,
    "hits": [
        {
            "_score": 4.186072,
            "_type": "ja-jp",
            "_source": {
                "DocumentId": "1041_1",
                "Language": "Japanese (Japan)"
            }
        },
        {
            "_score": 3.3488574,
            "_type": "de-de",
            "_source": {
                "DocumentId": "1031_1",
                "Language": "German (Germany)",
            }
        }

You can see that the total document count, max score, and the score of top 2 docs are all same. But the 2nd document from two clusters are different, although the score are same.

dadoonet · October 21, 2015, 6:50pm

Does 1031_1 also appears in the results with same score for the first cluster ?

Youxu · October 22, 2015, 12:55am

Thanks reply.
Yes, you are right, the 1031_1 also appears in first cluster results (at 6th position), the score is also 3.3488574. I think that is why we got different result from two cluster.

Any solution to force two cluster always return exactly same docs with same order?

dadoonet · October 22, 2015, 4:51am

You can use sort to sort by _score then by another field (like a date).
If you really want to have the exact same results, you need to have exactly the same data files everywhere (also primary and replicas).

Cluster1:

Set replicas to 0
Snapshot the index
Set replicas to 1

Cluster2:

Delete old index
Restore index from the repository you created earlier
Set replicas to 1

But as soon as you will index new data, your system could diverge again.

It's not really what I'd do.

Sorting by _score only means that you don't care of anything else then the score. So even if results are different on both cluster, they are correct.

Youxu · October 22, 2015, 10:12am

Sort by _score then sort by another field makes sense. Thanks David!

Topic		Replies	Views
The same query return different documents,why? Elasticsearch	2	477	May 17, 2017
Difference in the calculated score among the configured es servers for the same query in same time Elasticsearch	3	419	July 6, 2017
Different results for same query in Elasticsearch Cluster Elasticsearch	1	956	February 25, 2017
ES returns different results (and totals) alternately Elasticsearch	2	654	November 5, 2017
Different sorted result on same shard on different nodes Elasticsearch	3	712	December 26, 2019

Different results from two clusters on different data center

Related topics