we have been using ES productively for quite some time now but today we came across a new problem that we have never seen before:
We found two indices in our ES 6.8.14 cluster, each with one shard where primary and replica show different sync_ids and vastly different document counts. I reckon that means they are totally out of sync. I am having a hard time understanding how the cluster health can be "green" under these circumstances, though.
curl -H 'Content-Type: application/json' -XGET "prod-db01:9200/_cat/shards/archive1599113539"
archive1599113539 2 r STARTED 6720 40.4mb 10.0.82.232 prod-db01
archive1599113539 2 p STARTED 656 4.3mb 10.0.82.233 prod-db02
As you can see, the document count is far greater on the replica. Since it looks like the problem has gone undetected for weeks at least (since the cluster is "green") backups are probably unusable. Also, the usually recommended way of setting replicas to 0 and then back to 1 will probably result in significant data loss because there are more documents in the replica than in the primary.
I am sort of clueless how to deal with the situation and ask you:
How can the cluster be green? Should this not be a bug?
Is there ANY thinkable way of evaluating or dumping the data in primary and replica separately for a merge attempt to recover otherwise potentially lost data sets?
How do I prevent this from happening again and do you have tips for detecting this type of failure?
There seems to be no such setting at all. It is a three node cluster. All nodes have roles 'master', 'data' and 'ingest' set.
This is the contents of the discovery setting:
Ok if you have not set discovery.zen.minimum_master_nodes then that would explain it. There should be warnings about it in your logs, looking like this:
value for setting "discovery.zen.minimum_master_nodes" is too low. This can result in data loss!
Nothing very easy or robust, sorry. You could try using search preference to extract the contents of each shard so you can compare them. You'll need to do that for every shard, even the ones with matching doc counts, just to check that they really have the same docs in them and nothing got messed up in their mappings either.
Thanks again! It looks like we are going to be able to recover most of the data.
I'd like to get back to the original question, though: I still fail to comprehend how inconsistent replicas are not a sufficient condition to trigger a health warning?
It sort of does. The cluster health depends only on whether the shards are assigned or not, and the assignment process includes checks to make sure that all the copies are in sync. Unfortunately by configuring discovery.zen.minimum_master_nodes wrongly you end up with the information about which copies are in sync itself being out of sync, but there's not really a way to address that in general.
This is fixed in 7.x, in the sense that it is no longer possible to misconfigure Elasticsearch to lose data in this fashion.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.