Nodes Out of Sync

Bruce_Carey · December 6, 2017, 4:57pm

Hi,

We have several different clusters of ES5.02 running. Most are two nodes. On one, there's a large index with three shards, one replica. What I've noticed is that when queried, I'm getting different search results depending on which node ES decides to draw data from. This prompted me to look at other clusters, and I found the same behavior. I think it was less noticeable because the dataset was smaller.

So, I've been researching about how to keep nodes in sync, and what might cause then to go out of sync. All I've found so far is that ES automagically keeps nodes in sync. This obviously isn't happening.

To try and solve this, I thought maybe there was a damaged replica, so i set replicas to 0, then set it back to one. The nodes were in sync for barely minutes before falling out of sync again.

The whole point of having multiple nodes is for redundancy, which isn't the case if one node fails and it has an incomplete or incorrect data set.

What is failing here? How do I fix it? Is there a manual command that I could put in a cron to keep the things synced?

Thanks in advanced for all ideas and explanations. Feel free to ask for any info I've left out here.

mujtabahussain · December 6, 2017, 9:23pm

You are right. That is the reason why ES works so well.

You should never need to do that

When that happened, was there anything in the logs of any of nodes that might be relevant? And if so, could you please post that as well

Could you please post your elasticsearch.yml(with private info edited out, of course) and also, what environment are you running the clusters in?

Bruce_Carey · December 8, 2017, 2:39pm

I get that, but assuming I DID need to do that, as in the instance I'm reporting, how could I?

Environment: Azure, Ubuntu 16.04

It seems there was an issue that was causing ES to restart that was happening so quickly that our monitoring wasn't catching it. I believe this was causing the out of sync as mentioned here: https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

It says that "This is caused by operations that were in-flight when the primary shard failed and may not have been processed on all replica shards. Currently, the discrepancies are not repaired on primary promotion but instead would be repaired if replica shards are relocated"

So, since I'm not relocating shards, I'd need a way to manually trigger a resync.

warkolm · December 8, 2017, 8:24pm

Never do this. It'll break things as you are working against how Elasticsearch is designed to work.

What is out of sync exactly? How are you measuring this?
What sort of data is it? What does your query and results look like?

Bruce_Carey · December 8, 2017, 8:42pm

As to the queries, they're created by a different team. The data is Product info.

When I do a search on the site, either with criteria or an "empty search", the results change back and forth. It's harder to notice on a high traffic site, but on a newer one, it's really easy to spot. It goes back and forth between two distinct sets of products displayed, depending on which node ES has decided to route the traffic to.

I've seen docs about flush and Synced Flush. Assuming I ran the sync one at a time when we know we're not importing new info, would this help to actually sync the nodes, or does it only deal with flushing memory to disk?

warkolm · December 8, 2017, 8:48pm

You need to show exactly what you mean by out of sync. Providing your queries and responses is important, because search is relevant and both of these impact results. It could be something as simple as differences in document counts across shards, which not a fundamental problem for the operations of Elasticsearch, but will impact relevance and can be countered.

It's kinda hard to help here unless we can get direct answers. Flushing won't help.

Bruce_Carey · December 8, 2017, 9:08pm

I'll talk to one of the guys in Search and get back here asap. Likely Monday unfortunately.

system · January 5, 2018, 9:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Replicas out of sync Elasticsearch	19	4777	February 28, 2018
Inconsistent data among nodes for one (or more) indexes (ES 0.11) Elasticsearch	9	441	July 6, 2017
Network outage broke shards Elasticsearch	6	340	May 23, 2018
Cluster "green" - but shards not in sync !? 😯 Elasticsearch elastic-stack-monitoring	10	1567	April 23, 2021
Is it possible not all nodes in the cluster are sync on the same time? Elasticsearch	1	436	July 5, 2017

Nodes Out of Sync

Related topics