Shard allocation on restarted node takes too long


#1

We followed the tutorial for rolling restarts in Elasticsearch documentation:

  1. stop shard allocation
  2. bring the node down
  3. perform maintenance
  4. start the node again
  5. wait for node to join the cluster
  6. enable reallocation
    All seemed to work fine except that it takes too long, even more than 20-30 mintues, for the cluster to go back to green state after a node restart.
    Is there a better way to go about it?
    As soon as the restarted node comes up, we see that shards from other nodes are begin to allocate to this node which further slows the process of initializing the shards of that restarted node.

We currently have 374 monthly indices on ES with 5 shards each with 2 replicas. Our biggest monthly index has an average size 800GB(including replicas). The 13 data nodes are m4.xlarge aws instances with 1TB of disk on each node. Is network a bottleneck here?


(Yannick Welsch) #2
  • Are you making use of synced flush?
  • Is there indexing activity while the node is down?

You could maybe increase cluster.routing.allocation.node_concurrent_recoveries to speed up the recovery (see here).


#3

I was not using the synced flush as on going indexing operations were on. However, I consider increasing the cluster.routing.allocation.node_concurrent_recoveries to a larger number. Is there a way through which I could stop shards moving from other nodes to the restarted nodes?


(Christian Dahlqvist) #4

m4.large instances are very small and have limited network performance. Having 13 such small nodes is likely to be less efficient than having a smaller cluster with larger nodes as there would be less network traffic at higher throughput. I would therefore recommend using larger nodes, e.g. 7 m4.xlarge instances.

If you are not continuously indexing into all indices, you will benefit from using flushed sync, so make sure that you are on a version that supports this.


#5

Sorry for the misinformation. We are using m4.xlarge instances for data nodes. Flushed sync helps on indices that get data very infrequently and are a lot less in size. Inactive indices such as the ones in the previous months also tend to move around the cluster after a node restart. The concurrent recoveries option did help in speeding up the assignment process, but some shards take lot of time to move around. After looking at Marvel I find that there are several replica shards that are stuck in initializing phase and moving from one node to another. Are there any more networking configurations in ES that I might be missing? Does it make sense to increase cluster.routing.allocation.cluster_concurrent_rebalance value to a bigger value, say, 15-20? What can be the constraint for this setting?


(system) #6