Replicas out of sync

For one particular index, I've been having issues with the primary and replicas repeatedly getting out of sync.

image

When updating this index, I make a delete by query request to delete all values with a particular property, followed immediately by a bulk insert to re-add the updated values.

ES is version 5.6 running in a 5-node cluster.

I haven't been able to consistently reproduce it, and I can fix it temporarily by switching the replicas to 0 and back to 1 to get ES to rebuild them, but the issue seems to crop back up after a day or so.

Anyone run into this before?

1 Like

Hi there,

We have seen very occasional reports of this, and have been investigating, but it has proved extremely tricky for us to reproduce. We need help from someone like you who sees this problem regularly enough to be useful in diagnosis.

Please could you tell us more about this cluster and the environment in which it lives? For instance: what version are you running exactly? What is it running on? How frequently are you doing the bulk-delete-and-insert that you describe? What other activity does the cluster see?

Would you be willing to run the support diagnostics tool on your cluster and share the results? Don't post them here: I'll get you an email address to use if you can run this.

Would you be able to enable the following very verbose logging, and toggle the replica count to 0 and then back to 1 to make sure everything is in sync? I say again that this is very verbose so it will cause extra I/O and may fill up your disks, so proceed with caution here.

, "logger.org.elasticsearch.action.bulk": "TRACE"
, "logger.org.elasticsearch.cluster.service": "DEBUG"
, "logger.org.elasticsearch.indices.recovery": "TRACE"
, "logger.org.elasticsearch.index.shard": "TRACE"

In case it helps, we've only so far been able to reproduce anything like this by simulating some very strange networking failures that coincide with shards being reallocated, and even then it's very sporadic.

On top of what David suggested, can you share the exact version you use?

The cluster is a 5 node cluster, all running ES 5.4.0 as master/client/data, all Centos7.3 VMs with no plugins. Cluster has about 5 indexes in it, the largest one of which has about 1.6M documents with a fair amount of churn which has never gotten out of sync. It updates by diffing and just performing bulk update/deletes on individual documents which aside from # of documents is the only significant difference between it and the index that is getting out of sync. The index which is causing trouble is pretty new and just has a couple thousand documents, but is updated by just deleting (via delete_by_query) and re-adding groups of documents.

Version details:
"version": {
"number": "5.4.0",
"build_hash": "780f8c4",
"build_date": "2017-04-28T17:43:27.229Z",
"build_snapshot": false,
"lucene_version": "6.5.0"
}

I'll see if I can enable verbose logging. Unfortunately I can only reproduce this at the moment on our production cluster so will have to check about running the diagnostics tool.

The indexing on the client is being done with NEST on .net. Code looks about like this:

      var actions = terms
    .Select(term => new BulkIndexOperation<SearchTerm>(term)
    {
      Routing = term.ClientId
    })
    .Cast<IBulkOperation>();
 
  Client.Instance.DeleteByQuery(new DeleteByQueryRequest("inventory_suggestions", typeof(SearchTerm))
  {
    Query = new TermQuery
    {
      Field = typeof(SearchTerm).GetProperty(nameof(SearchTerm.ClientId)),
      Value = _client.ClientId.ToString()
    }
  });

  Client.Instance.Bulk(new BulkRequest("inventory_suggestions")
  {
    Operations = actions.ToList()
  });

Thanks, Eric, much appreciated.

We turned on the trace logging and were able to reproduce it getting out of sync on one of the shards. Seems to only be ~15 documents off at the moment. What's the best way to share the logs?

image

1 Like

Please could you zip them up and send them to me at david.turner@elastic.co? I'm unlikely to look at them before 0900 UTC Monday now, so don't promise an immediate response.

Logs sent. Thank you very much for your help.

Hi Eric,

Thanks for the logs, they're much appreciated. We have come up with one hypothesis about what might possibly be happening here, but unfortunately cannot test it from those logs alone. Could you possibly repeat the period of trace logging with the same settings, starting from a point where the shards are all in sync, wait for them to fall out of sync, and then grab a list of all the document IDs on both primary and replica as well as the logs? Ideally we'd like the indexing process to be stopped and for you to perform a refresh before querying for the doc IDs to make sure that we get everything.

Many thanks,

David

Awesome. We've got the trace logging enabled as before. I'll send you those results once it starts getting out of sync again.

Thanks Eric. Could you also confirm that, in this index at least, you're using auto-generated IDs, and not using external versioning at all?

Many thanks,

David

Yeah, that's correct.

Sent you logs along with the ids on primary/replica. I stopped all indexing and did a refresh of the index prior to pulling the IDs.

Awesome. We didn't find exactly what we expected, but we weren't far off. It seems there are occasions where you index a document and delete it very soon afterwards (before the indexing operation has even returned to the client), and the indexing and deletion operations are arriving in the wrong order at the replica, and for some reason (still under investigation) they're not being put back in the right order. We can now reproduce this with a single document.

As a workaround for you for now, I think it'd be sufficient to avoid running concurrent deletion and indexing operations on your inventory_suggestions_v1 index. Could you try that?

I put some stuff in place to try to prevent concurrent indexing and it hasn't gotten out of sync since. For a longer term workaround, I'm also changing around the indexing strategy some to do a bit more targeted updated/deletes since I think that would decrease the churn considerably for my use case.

It's an old blog post, but I've never seen it documented anywhere else. Read this: https://www.elastic.co/blog/elasticsearch-versioning-support

EDIT: As it seems this kind of "documentation" goes, you want to just skip ahead to the last section titled "Some final words about deletes.". What you need to know is never at the start!

I'd have to guess that you're re-using _id values within the window of ES' deleted document garbage collection process -- which would be unrelated to concurrent deleting/indexing.

There's likely a few solutions, but always using an (index-wide) increasing version number for each new doc is one way to fix this. Probably using ES' auto-generated _ids is another but I haven't confirmed that approach.

Good luck!

The documentation on the relationship between deletes and versioning is indeed quite scarce, and I agree that this should be properly spelled out in the reference manual. It is, however, not relevant in this case.

The OP is not re-using document IDs.

Assigning document IDs based on an external counter is certainly possible, but it's quite tricky to make it robust to all the things that might go wrong in your system, particularly network partitions and GC pauses. Auto-generated IDs allow Elasticsearch to do this for you, so I'd say to use that functionality unless you have a very compelling reason to use externally-assigned IDs.

Great news. Thanks for letting us know.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.