Very slow cluster restart

Amit · January 30, 2013, 1:31pm

Hi All,

I have an ElasticSearch with following config;

2 nodes- both are data nodes
Shrads replica- 0
number of shards -5

The number of active_primary_shards on my cluster is close to 15,000.

When I re-start my cluster it takes 40 to 45 mins to bring up the
cluster. I am running ElasticSearch with default setting.

When I look for the cluster health by;

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

I see that ES is initializing_shards 8 shards at a time. And since I have
15,000 active primary shards.It takes to much time, to bring all the
primary shards into cluster. I looked at the setting to improve this - cluster.routing.allocation.node_initial_primaries_recoveries
to a higher value 25

But I am not sure, should I change this value?

Please help me to understand the different setting that would allow me to
faster cluster re-start. Any formula or logic that would allow me to change
the default setting for faster cluster re-start.

Thanks in advance

Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

radu_gheorghe · January 30, 2013, 4:01pm

Hello Amit,

Since you have no replicas, increasing the
node_initial_primaries_recoveries is pretty much the only thing you can do
around recovery. There's no real formula about it, since it depends on lots
of variables, like number and size of shards, node hardware, whether or not
nodes will be hit by queries...

What you can do here is to increase that and observe the load and the
startup time/pace when restarting the cluster. When the load gets high (and
you see a decrease of startup times), it means you have too many concurrent
recoveries.

The other area to look at is the design of your indices. You have lots of
them (3000 I assume), so I'd look at ways to get that number way lower.
Because lots of indices which have lots of shards and, in turn, lots of
segments - which means you have a huge amount of files to be read during
recovery. By reducing the number of files you should have better recovery
times due to less disk seeks.

And besides the recovery, shards come with a memory overhead, and the more
shards your searches hit, the slower they get.

If it's not possible to reduce the number of indices, or at least not in
the short term, here are some other things you can do:

remove/backup your indices, re-create them with 1 shard each instead of
5, then reindex
if some of your indices don't change (eg: time-based indices), you can
optimize them and set max_num_segments to 1:
Elasticsearch Platform — Find real-time answers at scale | Elastic
if some indices are modified, you can still change the merge policy:
Elasticsearch Platform — Find real-time answers at scale | Elastic

For example, with the default "tiered" policy, you can set something
like index.merge.policy.segments_per_tier=4 instead of the default 10. And
to make it work you'll have to set index.merge.policy.max_merge_at_once=4
as well.

Note that tuning the merge policy for less segments will cause more
merging, which will increase the I/O usage during indexing or deleting docs.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Wed, Jan 30, 2013 at 3:31 PM, Amit Singh amitsingh.kec@gmail.com wrote:

Hi All,

I have an Elasticsearch with following config;

2 nodes- both are data nodes

Shrads replica- 0

number of shards -5

The number of active_primary_shards on my cluster is close to 15,000.

When I re-start my cluster it takes 40 to 45 mins to bring up the
cluster. I am running Elasticsearch with default setting.

When I look for the cluster health by;

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

I see that ES is initializing_shards 8 shards at a time. And since I have
15,000 active primary shards.It takes to much time, to bring all the
primary shards into cluster. I looked at the setting to improve this - cluster.routing.allocation.node_initial_primaries_recoveries
to a higher value 25

But I am not sure, should I change this value?

Please help me to understand the different setting that would allow me to
faster cluster re-start. Any formula or logic that would allow me to change
the default setting for faster cluster re-start.

Thanks in advance

Amit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Amit · January 31, 2013, 2:08pm

Thanks a ton Radu!
I will try to increase the node_initial_primaries_recoveries and see how
the cluster restart behaves.
Meanwhile the indices are already indexed so I cannot play much with lucene
merge policy. But for the new indexes I try to play with merge policy.

Thanks
Amit

On Wednesday, January 30, 2013 7:01:56 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have an Elasticsearch with following config;

2 nodes- both are data nodes

Shrads replica- 0

number of shards -5

The number of active_primary_shards on my cluster is close to 15,000.

When I re-start my cluster it takes 40 to 45 mins to bring up the
cluster. I am running Elasticsearch with default setting.

When I look for the cluster health by;

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

I see that ES is initializing_shards 8 shards at a time. And since I have
15,000 active primary shards.It takes to much time, to bring all the
primary shards into cluster. I looked at the setting to improve this - cluster.routing.allocation.node_initial_primaries_recoveries
to a higher value 25

But I am not sure, should I change this value?

Please help me to understand the different setting that would allow me to
faster cluster re-start. Any formula or logic that would allow me to change
the default setting for faster cluster re-start.

Thanks in advance

Amit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

radu_gheorghe · January 31, 2013, 5:45pm

Hi Amit,

You're welcome!

Please note that you can actually change the merge policy of an existing
index via the Indices Settings Update API:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Thu, Jan 31, 2013 at 4:08 PM, Amit Singh amitsingh.kec@gmail.com wrote:

Thanks a ton Radu!
I will try to increase the node_initial_primaries_**recoveries and see
how the cluster restart behaves.
Meanwhile the indices are already indexed so I cannot play much with
lucene merge policy. But for the new indexes I try to play with merge
policy.

Thanks
Amit

On Wednesday, January 30, 2013 7:01:56 PM UTC+5:30, Amit Singh wrote:

Hi All,

I have an Elasticsearch with following config;

2 nodes- both are data nodes

Shrads replica- 0

number of shards -5

The number of active_primary_shards on my cluster is close to 15,000.

When I re-start my cluster it takes 40 to 45 mins to bring up the
cluster. I am running Elasticsearch with default setting.

When I look for the cluster health by;

curl -XGET 'http://localhost:9200/_**cluster/health?pretty=true http://localhost:9200/_cluster/health?pretty=true
'

I see that ES is initializing_shards 8 shards at a time. And since I have
15,000 active primary shards.It takes to much time, to bring all the
primary shards into cluster. I looked at the setting to improve this -
cluster.routing.allocation.**node_initial_primaries_**recoveries to a
higher value 25

But I am not sure, should I change this value?

Please help me to understand the different setting that would allow me to
faster cluster re-start. Any formula or logic that would allow me to change
the default setting for faster cluster re-start.

Thanks in advance

Amit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Restarting many nodes Elasticsearch	3	280	July 19, 2018
Slow cluster startup (again) Elasticsearch	5	3127	July 6, 2017
Restarting node takes time Elasticsearch	4	1080	July 5, 2017
Slow initialisation time after restart Elasticsearch	11	2099	June 1, 2017
Configuration params to address slow node start Elasticsearch	9	1524	February 9, 2017

Very slow cluster restart

Best regards, Radu

Best regards, Radu

Related topics

Best regards,
Radu

Best regards,
Radu