GET api by doc_id returns different result whenever i try

Hi, i manage more than 100 ES clusters in my company for 3 years
But at last week, I faced very strange issue. I think it is not possible... Could you carefully check this?

ES version : 6.8.2
Cluster health : Green

GET api by doc_id returns different result whenever i try
As you can see in the picture, sometimes it says that there isn't such a document. And sometimes it says the result

I know score search can show different result since primary and replica can have different merge timing. I already read Getting consistent scoring | Elasticsearch Guide [6.8] | Elastic
But this is GET api and it should show consistent result, shouldn't it?

And this symptom is not transient. It is on-going for 1 week. It is still happening
This ES cluster has more than 100K ops/s indexing rate so merge would happen quite regularly

And when i try scoring search with preference,
when i use _primary, there is result
when i use _replica, there isn't result
(Check below pictures also) Replica count is 1
So it seems like primary shard has the document and replica doesn't have.

I tried using "realtime=false" but same symptom happened

As i said, cluster health is Green. When i checked master log, there isn't error log for now

We did DR test, make 1 AZ in AWS region down and see if our system is okay, at last week and this symptom happened after DR test
During the DR test, our ES cluster succeeded in failover. There was 3 min downtime but after that, remained nodes worked well. At that time, cluster state was yellow, not red

So i also suspected that this DR situation made corruption between primary and replica...But if so, it shouldn't be fixed automatically? ES seems not to detect this unsync problem.

If you provide your opinion, it would be thankful

I suspect you cluster is incorrectly configured and may be suffering from a split brain scenario.

How many master eligible nodes does the cluster have?

What is discovery.zen.minimum_master_nodes set to in your configutation?

For the cluster to be correctly configured this parameter should be set to the number of master eligible nodes required to for a strict majority in the cluster, e.g. 2 if you have 2 or 3 master eligible nodes, 3 if you have 4 or 5 master eligible nodes etc. This has always been a common area of misconfiguration, and can cause data loss and inconsistencies (similar to what you are seeing). This setting was removed as resiliency was improved in version 7.0 so is not an issue in newer versions. I would recommend you upgrade any older clusters.

No, i don't think it is split brain issue
We have 3 master node and discovery.zen.minimum_master_nodes is 2
Each master is in A, B, C zone. So when DR situation, only 1 master in A was down and there wan't any split brain issue
For now, 3 master joins well

If you do not have minimum_master_nodes configured incorrectly I would expect Elasticsearch to recover correctly. Do you have any custom settings around recovery or translog durability that could impact this?

I recall there were issues with resiliency in Elsticsearch 6 and earlier, which is why significant improvements to resiliency and stability were made in version 7.

To resolve this you should be able to drop the replica and then recreate it so it is guaranteed to be a copy of the primary. To avoid this in the future I would again recommend upgrading.

Here is our zen.discovery cluster settings. Most of them are just default

And when i checked index setting, we don't use any custom translog setting

Okay, i understand what you mean. I want to ask several more questions and get confirm

  1. While we are testing DR situation(network down and recovery), if translog itself or replay had issue, then consistency between primary and replica can be broken? Do you suspect that part?

  2. There is no way to do manual sync between primary and replica? Only reduce replica and recreate ?

Thanks for your help. If you have related issue link, please share it. Thanks

If the replica that comes back is out of date it should be replaced by a copy of the primary, so I would not expect this unles you are hitting some bug or have some custom setting that reduces reliability. I have not troubleshot this type of issues in version 6 for yaers so do not remember details.

No, not as far as I know. The addition of sequence numbers in version 7 made this a lot better and more efficient, but in version 6 I think that is the only way.

Network disconnect time was short, about 20 minutes
So existing replica seemed not to be replaced. According to our metric, there seemed no index recovery which usually happens when shard moves
So existing shard seemed to get translog replay from primary

I want to ask if there is any ticket related to this kind of issue?
Or where can i search related tickets? Github or Jira?

And i also want to ask if we upgrade to version 7, we can guarantee that this kind of issue doesn't happen.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.