Slow initialisation time after restart

Deepak_Mehta1 · May 3, 2017, 8:37am

We are facing the slow shard initialisation time. I went thru the post http://elasticsearch-users.115913.n3.nabble.com/Restarting-an-active-node-without-needing-to-recover-all-data-remotely-td4039346.html#a4039355

and it suggest Elasticsearch is going to do some improvement on slow restart process.

Did elastic search made any fix to improve it?

Christian_Dahlqvist · May 3, 2017, 8:41am

Which version of Elasticsearch are you on? How much data/indices/shards do you have in the cluster?

Deepak_Mehta1 · May 3, 2017, 8:48am

We are using Elastic search 2.4.2.

We have around 85 indices/10 shards per index and total of 30TB of data.
We have
3 master nodes
3 client nodes
18 data nodes ( with 3TB disk space and 64GB RAM. 32GB allocated to ES).

If I follow rolling restart process with disable indexing and sync flush, recovery is around 15 mins.
However if any node leave the cluster, and come back say due to network issue or any other issue, then recovery is > 3hours. (indexing is on)

I was monitoring the stats today and noticed that, initialisation of shards itself took 3 hours and there was no reallocation done.

My question is why re-initialisation from local node is taking > 3 hours? Is there any settings we are missing?

Christian_Dahlqvist · May 3, 2017, 8:50am

Are you actively indexing into all of these indices? Do you stop indexing when you perform a rolling restart?

Deepak_Mehta1 · May 3, 2017, 8:52am

While rolling restart we stop indexing.
But in case of network failure if any data node leave cluster we don't stop indexing.

Christian_Dahlqvist · May 3, 2017, 8:55am

If you are indexing into all, or at least a large portion of the indices, synced flush will not help and the shards will need to be replaced, which probably explains the much longer recovery time.

Deepak_Mehta1 · May 3, 2017, 9:00am

So In case of network failure, with delay allocation to 5m,

If we find any data node left the cluster (using some monitoring tools) and stop indexing ( This would be after data node left the cluster.) would that help in recovery?

Basically stop indexing after node left the cluster would help in recovery?

Christian_Dahlqvist · May 3, 2017, 9:04am

I suspect the shards would still deviate, so am not sure that would help. If you had indices that you were not actively indexing into, those should recover faster. What type of data do you have in the cluster?

Deepak_Mehta1 · May 3, 2017, 9:07am

We have mutable data. and all indices are always active, as new documents from customers are indexed and old documents are updated frequently.

Christian_Dahlqvist · May 3, 2017, 9:10am

I suspected that may be the case, and am afraid I do not have any good suggestions. Maybe someone else in the community may have some suggestions?

Deepak_Mehta1 · May 4, 2017, 7:56am

Looking forward to get more suggestion.

system · June 1, 2017, 8:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Very slow cluster restart Elasticsearch	4	4537	July 6, 2017
Restarting of node taking much time Elasticsearch	6	2440	July 6, 2017
Restarting many nodes Elasticsearch	3	280	July 19, 2018
Why does a restart performs recovery which takes long time (6-12hrs)? Elasticsearch	3	2803	January 23, 2019
Slow startup (replica recovery in logs) Elasticsearch	11	1794	July 6, 2017

Slow initialisation time after restart

Related topics