We are using Elastic search 2.4.2.
We have around 85 indices/10 shards per index and total of 30TB of data.
3 master nodes
3 client nodes
18 data nodes ( with 3TB disk space and 64GB RAM. 32GB allocated to ES).
If I follow rolling restart process with disable indexing and sync flush, recovery is around 15 mins.
However if any node leave the cluster, and come back say due to network issue or any other issue, then recovery is > 3hours. (indexing is on)
I was monitoring the stats today and noticed that, initialisation of shards itself took 3 hours and there was no reallocation done.
My question is why re-initialisation from local node is taking > 3 hours? Is there any settings we are missing?