For one particular index, I've been having issues with the primary and replicas repeatedly getting out of sync.
When updating this index, I make a delete by query request to delete all values with a particular property, followed immediately by a bulk insert to re-add the updated values.
ES is version 5.6 running in a 5-node cluster.
I haven't been able to consistently reproduce it, and I can fix it temporarily by switching the replicas to 0 and back to 1 to get ES to rebuild them, but the issue seems to crop back up after a day or so.
We have seen very occasional reports of this, and have been investigating, but it has proved extremely tricky for us to reproduce. We need help from someone like you who sees this problem regularly enough to be useful in diagnosis.
Please could you tell us more about this cluster and the environment in which it lives? For instance: what version are you running exactly? What is it running on? How frequently are you doing the bulk-delete-and-insert that you describe? What other activity does the cluster see?
Would you be willing to run the support diagnostics tool on your cluster and share the results? Don't post them here: I'll get you an email address to use if you can run this.
Would you be able to enable the following very verbose logging, and toggle the replica count to 0 and then back to 1 to make sure everything is in sync? I say again that this is very verbose so it will cause extra I/O and may fill up your disks, so proceed with caution here.
In case it helps, we've only so far been able to reproduce anything like this by simulating some very strange networking failures that coincide with shards being reallocated, and even then it's very sporadic.
The cluster is a 5 node cluster, all running ES 5.4.0 as master/client/data, all Centos7.3 VMs with no plugins. Cluster has about 5 indexes in it, the largest one of which has about 1.6M documents with a fair amount of churn which has never gotten out of sync. It updates by diffing and just performing bulk update/deletes on individual documents which aside from # of documents is the only significant difference between it and the index that is getting out of sync. The index which is causing trouble is pretty new and just has a couple thousand documents, but is updated by just deleting (via delete_by_query) and re-adding groups of documents.
I'll see if I can enable verbose logging. Unfortunately I can only reproduce this at the moment on our production cluster so will have to check about running the diagnostics tool.
We turned on the trace logging and were able to reproduce it getting out of sync on one of the shards. Seems to only be ~15 documents off at the moment. What's the best way to share the logs?
Please could you zip them up and send them to me at david.turner@elastic.co? I'm unlikely to look at them before 0900 UTC Monday now, so don't promise an immediate response.
Thanks for the logs, they're much appreciated. We have come up with one hypothesis about what might possibly be happening here, but unfortunately cannot test it from those logs alone. Could you possibly repeat the period of trace logging with the same settings, starting from a point where the shards are all in sync, wait for them to fall out of sync, and then grab a list of all the document IDs on both primary and replica as well as the logs? Ideally we'd like the indexing process to be stopped and for you to perform a refresh before querying for the doc IDs to make sure that we get everything.
Awesome. We didn't find exactly what we expected, but we weren't far off. It seems there are occasions where you index a document and delete it very soon afterwards (before the indexing operation has even returned to the client), and the indexing and deletion operations are arriving in the wrong order at the replica, and for some reason (still under investigation) they're not being put back in the right order. We can now reproduce this with a single document.
As a workaround for you for now, I think it'd be sufficient to avoid running concurrent deletion and indexing operations on your inventory_suggestions_v1 index. Could you try that?
I put some stuff in place to try to prevent concurrent indexing and it hasn't gotten out of sync since. For a longer term workaround, I'm also changing around the indexing strategy some to do a bit more targeted updated/deletes since I think that would decrease the churn considerably for my use case.
EDIT: As it seems this kind of "documentation" goes, you want to just skip ahead to the last section titled "Some final words about deletes.". What you need to know is never at the start!
I'd have to guess that you're re-using _id values within the window of ES' deleted document garbage collection process -- which would be unrelated to concurrent deleting/indexing.
There's likely a few solutions, but always using an (index-wide) increasing version number for each new doc is one way to fix this. Probably using ES' auto-generated _ids is another but I haven't confirmed that approach.
The documentation on the relationship between deletes and versioning is indeed quite scarce, and I agree that this should be properly spelled out in the reference manual. It is, however, not relevant in this case.
The OP is not re-using document IDs.
Assigning document IDs based on an external counter is certainly possible, but it's quite tricky to make it robust to all the things that might go wrong in your system, particularly network partitions and GC pauses. Auto-generated IDs allow Elasticsearch to do this for you, so I'd say to use that functionality unless you have a very compelling reason to use externally-assigned IDs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.