Hi, i manage more than 100 ES clusters in my company for 3 years
But at last week, I faced very strange issue. I think it is not possible... Could you carefully check this?
ES version : 6.8.2
Cluster health : Green
GET api by doc_id returns different result whenever i try
As you can see in the picture, sometimes it says that there isn't such a document. And sometimes it says the result
I know score search can show different result since primary and replica can have different merge timing. I already read Getting consistent scoring | Elasticsearch Guide [6.8] | Elastic
But this is GET api and it should show consistent result, shouldn't it?
And this symptom is not transient. It is on-going for 1 week. It is still happening
This ES cluster has more than 100K ops/s indexing rate so merge would happen quite regularly
And when i try scoring search with preference,
when i use _primary, there is result
when i use _replica, there isn't result
(Check below pictures also) Replica count is 1
So it seems like primary shard has the document and replica doesn't have.
I tried using "realtime=false" but same symptom happened
As i said, cluster health is Green. When i checked master log, there isn't error log for now
We did DR test, make 1 AZ in AWS region down and see if our system is okay, at last week and this symptom happened after DR test
During the DR test, our ES cluster succeeded in failover. There was 3 min downtime but after that, remained nodes worked well. At that time, cluster state was yellow, not red
So i also suspected that this DR situation made corruption between primary and replica...But if so, it shouldn't be fixed automatically? ES seems not to detect this unsync problem.
If you provide your opinion, it would be thankful