Preventing ES Shard Dance

Hello,

We've been doing a lot of ElasticSearch performance testing lately. While
testing, we've experienced the "ES shard dance" shown in the attachment
whenever we restarted any of the nodes. This, of course, made testing hard
because we couldn't keep a fixed shard distribution between restarts and
between some of the test runs, plus it slowed us down (you can see this
shard dance took over 1 hour).

Is it possible to start ElasticSearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring ElasticSearch Engineers World-Wide --

hi otis

this is described
here Allow to disable shard allocations · Issue #1358 · elastic/elasticsearch · GitHub

On Tuesday, March 6, 2012 12:13:24 PM UTC+1, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately. While
testing, we've experienced the "ES shard dance" shown in the attachment
whenever we restarted any of the nodes. This, of course, made testing hard
because we couldn't keep a fixed shard distribution between restarts and
between some of the test runs, plus it slowed us down (you can see this
shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

Disabling replica allocation will avoid rebalancing on cluster shutdown as dobe points out, but not on cluster restart, if the nodes don't all come up around the same time. Set the gateway parameters as described at Elasticsearch Platform — Find real-time answers at scale | Elastic

To give all your nodes time to initialize before recovery/rebalancing is performed. For example, in a cluster with 3 nodes and 1 replica per shard, you might set the parameters to:

gateway:
recover_after_nodes: 2
recover_after_time: 5m
expected_nodes: 3

Which would give the 3rd node up to 5 minutes to finish initializing before the first 2 nodes give up on it and start rebalancing.

--Mark

On Mar 6, 2012, at 4:32 AM, dobe wrote:

hi otis

this is described here Allow to disable shard allocations · Issue #1358 · elastic/elasticsearch · GitHub

On Tuesday, March 6, 2012 12:13:24 PM UTC+1, Otis Gospodnetic wrote:
Hello,

We've been doing a lot of Elasticsearch performance testing lately. While testing, we've experienced the "ES shard dance" shown in the attachment whenever we restarted any of the nodes. This, of course, made testing hard because we couldn't keep a fixed shard distribution between restarts and between some of the test runs, plus it slowed us down (you can see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide -- Jobs - Sematext

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately. While
testing, we've experienced the "ES shard dance" shown in the attachment
whenever we restarted any of the nodes. This, of course, made testing hard
because we couldn't keep a fixed shard distribution between restarts and
between some of the test runs, plus it slowed us down (you can see this
shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

Hi Mark,

That graph came from SPM for Elasticsearch. It's like SPM for Solr (
Sematext Apache Solr Monitoring | Performance Monitoring Tools), but with
ES metrics. It's not 100% polished, but that's happening as I type. It's
currently free and you can get it via http://apps.sematext.com/ (you can
also get free Search Analytics from there).

Otis

Hiring Elasticsearch Engineers World-Wide --

On Wednesday, March 7, 2012 1:19:31 PM UTC+8, Mark Waddle wrote:

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately.
While testing, we've experienced the "ES shard dance" shown in the
attachment whenever we restarted any of the nodes. This, of course, made
testing hard because we couldn't keep a fixed shard distribution between
restarts and between some of the test runs, plus it slowed us down (you can
see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards
around even if it thinks there is a better way to distribute them?

Thanks,
Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

It's not where Otis's graphs are coming from, but we get similar graphs out of OpenTSDB/tcollector attached to Elasticsearch. (We use OpenTSDB/tcollector with a simple graphite adapter and Coda Hale's metrics to gather metrics from other systems as well.)

-- Paul

On Mar 6, 2012, at 9:19 PM, Mark Waddle wrote:

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:
Hello,

We've been doing a lot of Elasticsearch performance testing lately. While testing, we've experienced the "ES shard dance" shown in the attachment whenever we restarted any of the nodes. This, of course, made testing hard because we couldn't keep a fixed shard distribution between restarts and between some of the test runs, plus it slowed us down (you can see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards around even if it thinks there is a better way to distribute them?

Thanks,
Otis

++OpenTSDB

+Graphite

On Wednesday, March 7, 2012 at 7:54 PM, Paul Brown wrote:

It's not where Otis's graphs are coming from, but we get similar graphs out of OpenTSDB/tcollector attached to Elasticsearch. (We use OpenTSDB/tcollector with a simple graphite adapter and Coda Hale's metrics to gather metrics from other systems as well.)

-- Paul
On Mar 6, 2012, at 9:19 PM, Mark Waddle wrote:

Hi Otis,

What tooling are you using to gather and chart those metrics?

Mark

On Tuesday, March 6, 2012 3:13:24 AM UTC-8, Otis Gospodnetic wrote:

Hello,

We've been doing a lot of Elasticsearch performance testing lately. While testing, we've experienced the "ES shard dance" shown in the attachment whenever we restarted any of the nodes. This, of course, made testing hard because we couldn't keep a fixed shard distribution between restarts and between some of the test runs, plus it slowed us down (you can see this shard dance took over 1 hour).

Is it possible to start Elasticsearch and tell it not to move any shards around even if it thinks there is a better way to distribute them?

Thanks,
Otis