We have several different clusters of ES5.02 running. Most are two nodes. On one, there's a large index with three shards, one replica. What I've noticed is that when queried, I'm getting different search results depending on which node ES decides to draw data from. This prompted me to look at other clusters, and I found the same behavior. I think it was less noticeable because the dataset was smaller.
So, I've been researching about how to keep nodes in sync, and what might cause then to go out of sync. All I've found so far is that ES automagically keeps nodes in sync. This obviously isn't happening.
To try and solve this, I thought maybe there was a damaged replica, so i set replicas to 0, then set it back to one. The nodes were in sync for barely minutes before falling out of sync again.
The whole point of having multiple nodes is for redundancy, which isn't the case if one node fails and it has an incomplete or incorrect data set.
What is failing here? How do I fix it? Is there a manual command that I could put in a cron to keep the things synced?
Thanks in advanced for all ideas and explanations. Feel free to ask for any info I've left out here.
It says that "This is caused by operations that were in-flight when the primary shard failed and may not have been processed on all replica shards. Currently, the discrepancies are not repaired on primary promotion but instead would be repaired if replica shards are relocated"
So, since I'm not relocating shards, I'd need a way to manually trigger a resync.
As to the queries, they're created by a different team. The data is Product info.
When I do a search on the site, either with criteria or an "empty search", the results change back and forth. It's harder to notice on a high traffic site, but on a newer one, it's really easy to spot. It goes back and forth between two distinct sets of products displayed, depending on which node ES has decided to route the traffic to.
I've seen docs about flush and Synced Flush. Assuming I ran the sync one at a time when we know we're not importing new info, would this help to actually sync the nodes, or does it only deal with flushing memory to disk?
You need to show exactly what you mean by out of sync. Providing your queries and responses is important, because search is relevant and both of these impact results. It could be something as simple as differences in document counts across shards, which not a fundamental problem for the operations of Elasticsearch, but will impact relevance and can be countered.
It's kinda hard to help here unless we can get direct answers. Flushing won't help.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.