Restarting of node taking much time

Ankit_Jain · October 9, 2013, 12:06pm

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards
(total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · October 9, 2013, 2:45pm

Are the restarts planned or are they server crashes? If they are planned,
you should disabling indexing (if possible), flush the index and
temporarily disable allocation.

Here is a repost of something I wrote two days ago:

Elasticsearch has been throttling I/O recovery since version 0.90. The
defaults are fairly low. Trying increasing the
indices.recovery.max_bytes_per_sec
setting.

You can also increase the number of shards that are recovered at the same
time. The default is 2. Increase either value too much and you will have
long IO waits.

Cheers,

Ivan

On Wed, Oct 9, 2013 at 5:06 AM, Ankit Jain ankitjaincs06@gmail.com wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200
shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ankit_Jain · October 9, 2013, 4:54pm

Thanks Ivan.

We are planning to store TB's of data into Elasticsearch cluster. I don't
think *indices.recovery.max_bytes_**per_sec *would going to help me much.

Is any specific files which are required ES node during restart or any
specific metadata information required during restart?

Also, can you suggest some alternative ways to improve node recovery time?

Regards,
Ankit Jain

On Wednesday, 9 October 2013 20:15:53 UTC+5:30, Ivan Brusic wrote:

Are the restarts planned or are they server crashes? If they are planned,
you should disabling indexing (if possible), flush the index and
temporarily disable allocation.

Here is a repost of something I wrote two days ago:

Elasticsearch has been throttling I/O recovery since version 0.90. The
defaults are fairly low. Trying increasing the indices.recovery.max_bytes_per_sec
setting.

Elasticsearch Platform — Find real-time answers at scale | Elastic

You can also increase the number of shards that are recovered at the same
time. The default is 2. Increase either value too much and you will have
long IO waits.

Cheers,

Ivan

On Wed, Oct 9, 2013 at 5:06 AM, Ankit Jain <ankitj...@gmail.com<javascript:>

wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200
shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 9, 2013, 5:59pm

You have decided to put 200 shards on a single node. Such a high number
combined with a significant shard size can not be recovered very quick.
Although you can put many thousands shards on a single node and shards can
grow into many hundreds of GB, you always have a price to pay when the
shards start moving around between nodes at recovery time.

The improvement depends on how much you want to stress a node while
recovering. The default settings are chosen wisely so when a node comes up,
it can always respond to search and index requests immediately while
recovery. It does not confuse clients with timeouts, and it does not
confuse sysadmins with iowaits. But there is a long down time, and this
down time is directly configured by the number of shards per node and the
current shard size.

So if you want to stress your nodes, you can select a higher value in
cluster.routing.allocation.node_concurrent_recoveries.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

Pros:

recovery may be faster

Cons

recovery takes many network resources
queries and indexing may not respond in time
higher iowaits
network bandwidth must be available

The best method to achieve quick recovery is to select a wise shards per
node ratio and a sane shard size.

Here is my approach: on my 32-core machines, I plan to never run more than
32 shards, and no shard shall grow beyond 5-10g, so transporting on a
10GBit/s takes reasonable time.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

shadyabhi · October 10, 2013, 7:50am

How are you restarting your nodes? Are you using via init scripts or

? Using the API seems to do pretty fast recovery for me.

On Wed, Oct 9, 2013 at 5:36 PM, Ankit Jain ankitjaincs06@gmail.com wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards
(total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ankit_Jain · October 10, 2013, 10:38am

Thanks Jorg and Abhijeet

@Abhijeet we are taking scenario of server crashes.
Also, the amount data server by each shard is around 100 GB.

Regards,
Ankit Jain

On Thursday, 10 October 2013 13:20:27 UTC+5:30, Abhijeet Rastogi wrote:

How are you restarting your nodes? Are you using via init scripts or

Elasticsearch Platform — Find real-time answers at scale | Elastic
? Using the API seems to do pretty fast recovery for me.

On Wed, Oct 9, 2013 at 5:36 PM, Ankit Jain <ankitj...@gmail.com<javascript:>>
wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200
shards
(total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Restarting node takes time Elasticsearch	4	1080	July 5, 2017
Elasticsearch quick recovery after restart Elasticsearch	3	520	July 6, 2017
Shard allocation on restarted node takes too long Elasticsearch	5	3394	July 5, 2017
Elasticsearch rolling restart recovery is slow Elasticsearch	3	1248	January 10, 2020
Slow initialisation time after restart Elasticsearch	11	2099	June 1, 2017

Restarting of node taking much time

Related topics