We followed the tutorial for rolling restarts in Elasticsearch documentation:
- stop shard allocation
- bring the node down
- perform maintenance
- start the node again
- wait for node to join the cluster
- enable reallocation
All seemed to work fine except that it takes too long, even more than 20-30 mintues, for the cluster to go back to green state after a node restart.
Is there a better way to go about it?
As soon as the restarted node comes up, we see that shards from other nodes are begin to allocate to this node which further slows the process of initializing the shards of that restarted node.
We currently have 374 monthly indices on ES with 5 shards each with 2 replicas. Our biggest monthly index has an average size 800GB(including replicas). The 13 data nodes are m4.xlarge aws instances with 1TB of disk on each node. Is network a bottleneck here?