Preventing ES Shard Dance

otisg · March 6, 2012, 11:13am

Hello,

We've been doing a lot of ElasticSearch performance testing lately. While
testing, we've experienced the "ES shard dance" shown in the attachment
whenever we restarted any of the nodes. This, of course, made testing hard
because we couldn't keep a fixed shard distribution between restarts and
between some of the test runs, plus it slowed us down (you can see this
shard dance took over 1 hour).

Is it possible to start ElasticSearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring ElasticSearch Engineers World-Wide --

dobe · March 6, 2012, 12:32pm

hi otis

this is described
here Allow to disable shard allocations · Issue #1358 · elastic/elasticsearch · GitHub

On Tuesday, March 6, 2012 12:13:24 PM UTC+1, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately. While
testing, we've experienced the "ES shard dance" shown in the attachment
whenever we restarted any of the nodes. This, of course, made testing hard
because we couldn't keep a fixed shard distribution between restarts and
between some of the test runs, plus it slowed us down (you can see this
shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

Mark_Huang · March 6, 2012, 6:09pm

Disabling replica allocation will avoid rebalancing on cluster shutdown as dobe points out, but not on cluster restart, if the nodes don't all come up around the same time. Set the gateway parameters as described at Elasticsearch Platform — Find real-time answers at scale | Elastic

To give all your nodes time to initialize before recovery/rebalancing is performed. For example, in a cluster with 3 nodes and 1 replica per shard, you might set the parameters to:

gateway:
recover_after_nodes: 2
recover_after_time: 5m
expected_nodes: 3

Which would give the 3rd node up to 5 minutes to finish initializing before the first 2 nodes give up on it and start rebalancing.

--Mark

On Mar 6, 2012, at 4:32 AM, dobe wrote:

hi otis

this is described here Allow to disable shard allocations · Issue #1358 · elastic/elasticsearch · GitHub

On Tuesday, March 6, 2012 12:13:24 PM UTC+1, Otis Gospodnetic wrote:
Hello,

We've been doing a lot of Elasticsearch performance testing lately. While testing, we've experienced the "ES shard dance" shown in the attachment whenever we restarted any of the nodes. This, of course, made testing hard because we couldn't keep a fixed shard distribution between restarts and between some of the test runs, plus it slowed us down (you can see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide -- Jobs - Sematext

Mark_Waddle · March 7, 2012, 5:19am

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately. While
testing, we've experienced the "ES shard dance" shown in the attachment
whenever we restarted any of the nodes. This, of course, made testing hard
because we couldn't keep a fixed shard distribution between restarts and
between some of the test runs, plus it slowed us down (you can see this
shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

otisg · March 7, 2012, 5:44pm

Hi Mark,

That graph came from SPM for Elasticsearch. It's like SPM for Solr (
Sematext Apache Solr Monitoring | Performance Monitoring Tools), but with
ES metrics. It's not 100% polished, but that's happening as I type. It's
currently free and you can get it via http://apps.sematext.com/ (you can
also get free Search Analytics from there).

Otis

Hiring Elasticsearch Engineers World-Wide --

On Wednesday, March 7, 2012 1:19:31 PM UTC+8, Mark Waddle wrote:

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately.
While testing, we've experienced the "ES shard dance" shown in the
attachment whenever we restarted any of the nodes. This, of course, made
testing hard because we couldn't keep a fixed shard distribution between
restarts and between some of the test runs, plus it slowed us down (you can
see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

Paul_Brown · March 7, 2012, 5:54pm

It's not where Otis's graphs are coming from, but we get similar graphs out of OpenTSDB/tcollector attached to Elasticsearch. (We use OpenTSDB/tcollector with a simple graphite adapter and Coda Hale's metrics to gather metrics from other systems as well.)

-- Paul

On Mar 6, 2012, at 9:19 PM, Mark Waddle wrote:

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:
Hello,

We've been doing a lot of Elasticsearch performance testing lately. While testing, we've experienced the "ES shard dance" shown in the attachment whenever we restarted any of the nodes. This, of course, made testing hard because we couldn't keep a fixed shard distribution between restarts and between some of the test runs, plus it slowed us down (you can see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards around even if it thinks there is a better way to distribute them?

Thanks,
Otis

kimchy · March 7, 2012, 8:50pm

++OpenTSDB

+Graphite

On Wednesday, March 7, 2012 at 7:54 PM, Paul Brown wrote:

It's not where Otis's graphs are coming from, but we get similar graphs out of OpenTSDB/tcollector attached to Elasticsearch. (We use OpenTSDB/tcollector with a simple graphite adapter and Coda Hale's metrics to gather metrics from other systems as well.)

-- Paul
On Mar 6, 2012, at 9:19 PM, Mark Waddle wrote:

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately. While testing, we've experienced the "ES shard dance" shown in the attachment whenever we restarted any of the nodes. This, of course, made testing hard because we couldn't keep a fixed shard distribution between restarts and between some of the test runs, plus it slowed us down (you can see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Topic		Replies	Views
Preventing re-allocation of shards Elasticsearch	2	334	July 6, 2017
Disable shard allocation when bouncing different node types? Elasticsearch	1	374	August 30, 2018
Restarts take forever, even with shard allocation disabled, when node hasn't been restarted recently Elasticsearch	3	394	July 6, 2017
Fundamental question about ES data/shards Elasticsearch	3	417	July 6, 2017
Shard re-allocation taking a very long time Elasticsearch	16	7416	April 15, 2019

Preventing ES Shard Dance

Thanks, Otis

Thanks, Otis

Thanks, Otis

Thanks, Otis

Otis

Thanks, Otis

Related topics

Thanks,
Otis

Thanks,
Otis

Thanks,
Otis

Thanks,
Otis

Thanks,
Otis