Restarting of node taking much time


(Ankit Jain) #1

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards
(total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

Are the restarts planned or are they server crashes? If they are planned,
you should disabling indexing (if possible), flush the index and
temporarily disable allocation.

Here is a repost of something I wrote two days ago:

Elasticsearch has been throttling I/O recovery since version 0.90. The
defaults are fairly low. Trying increasing the
indices.recovery.max_bytes_per_sec
setting.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html

You can also increase the number of shards that are recovered at the same
time. The default is 2. Increase either value too much and you will have
long IO waits.

Cheers,

Ivan

On Wed, Oct 9, 2013 at 5:06 AM, Ankit Jain ankitjaincs06@gmail.com wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200
shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankit Jain) #3

Thanks Ivan.

We are planning to store TB's of data into ElasticSearch cluster. I don't
think *indices.recovery.max_bytes_**per_sec *would going to help me much.

Is any specific files which are required ES node during restart or any
specific metadata information required during restart?

Also, can you suggest some alternative ways to improve node recovery time?

Regards,
Ankit Jain

On Wednesday, 9 October 2013 20:15:53 UTC+5:30, Ivan Brusic wrote:

Are the restarts planned or are they server crashes? If they are planned,
you should disabling indexing (if possible), flush the index and
temporarily disable allocation.

Here is a repost of something I wrote two days ago:

Elasticsearch has been throttling I/O recovery since version 0.90. The
defaults are fairly low. Trying increasing the indices.recovery.max_bytes_per_sec
setting.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html

You can also increase the number of shards that are recovered at the same
time. The default is 2. Increase either value too much and you will have
long IO waits.

Cheers,

Ivan

On Wed, Oct 9, 2013 at 5:06 AM, Ankit Jain <ankitj...@gmail.com<javascript:>

wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200
shards (total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

You have decided to put 200 shards on a single node. Such a high number
combined with a significant shard size can not be recovered very quick.
Although you can put many thousands shards on a single node and shards can
grow into many hundreds of GB, you always have a price to pay when the
shards start moving around between nodes at recovery time.

The improvement depends on how much you want to stress a node while
recovering. The default settings are chosen wisely so when a node comes up,
it can always respond to search and index requests immediately while
recovery. It does not confuse clients with timeouts, and it does not
confuse sysadmins with iowaits. But there is a long down time, and this
down time is directly configured by the number of shards per node and the
current shard size.

So if you want to stress your nodes, you can select a higher value in
cluster.routing.allocation.node_concurrent_recoveries.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html

Pros:

  • recovery may be faster

Cons

  • recovery takes many network resources
  • queries and indexing may not respond in time
  • higher iowaits
  • network bandwidth must be available

The best method to achieve quick recovery is to select a wise shards per
node ratio and a sane shard size.

Here is my approach: on my 32-core machines, I plan to never run more than
32 shards, and no shard shall grow beyond 5-10g, so transporting on a
10GBit/s takes reasonable time.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(shadyabhi) #5

How are you restarting your nodes? Are you using via init scripts or
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html
? Using the API seems to do pretty fast recovery for me.

On Wed, Oct 9, 2013 at 5:36 PM, Ankit Jain ankitjaincs06@gmail.com wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200 shards
(total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankit Jain) #6

Thanks Jorg and Abhijeet

@Abhijeet we are taking scenario of server crashes.
Also, the amount data server by each shard is around 100 GB.

Regards,
Ankit Jain

On Thursday, 10 October 2013 13:20:27 UTC+5:30, Abhijeet Rastogi wrote:

How are you restarting your nodes? Are you using via init scripts or

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-nodes-shutdown.html
? Using the API seems to do pretty fast recovery for me.

On Wed, Oct 9, 2013 at 5:36 PM, Ankit Jain <ankitj...@gmail.com<javascript:>>
wrote:

Hi All,

We have deployed 5 nodes cluster and each node is serving around 200
shards
(total number of indices are 200 and each index has 200 shards).

While restarting each node is taking around 20 to 30 minutes to move all
shards from unassigned state to assigned state.

How we can quickly move all the shards from unassigned state to assigned
state?

Is the recovery time dependent on data size?

Regards,
Ankit Jain

--
You received this message because you are subscribed to the Google
Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7