Shard sync_id keeps changing in read-only cluster (ES 5.5)

I was hoping to use synced flush to speed up recovery after a node restart, but I have not been able to make it work. Is it possible to make sure that sync_id does not change for any shards, to avoid copying indices across the network?

I have tried the following without any success:

  • cluster.routing.allocation.enable is set to none
  • cluster.blocks.read_only is set to true
  • synced flush has been performed successfully on all indices
  • Shut down all external processes trying to index, delete or change anything.

Some properties of the cluster:

  • 5TB of data
  • 10 nodes
  • 4000 indices of uneven size.
  • 50-100 writes (indexing+delete) per second
  • Elasticsearch 5.5

Recovery of just one node takes several hours. I know that it is possible to tune recovery speed by various settings, but it will always be much slower to copy than to use local files.

Is this working as expected? Are there anything else I can do to avoid copying indices across the network when a node is restarted?

Thanks!

Further debugging showed that elastic will always set new sync_ids when marking indices as inactive (which happens after 5 minutes of no indexing activity by default). It doesn't matter if the index has already been sync flushed manually and nothing needs to be synced.

So instead of performing a synced flush after shutting down indexing, it is much quicker to wait for 5 minutes (indices.memory.shard_inactive_time) before restarting a node.

See https://github.com/elastic/elasticsearch/issues/27838

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.