Trying to optimize configuration for better cluster restart/recovery

Tony_Su · February 7, 2014, 11:11pm

At first, I noticed what some have called "shard thrashing," ie during
startup shards are re-allocated as nodes come online.

Have implemented the following by either creating a new setting or
modifying existing settings in elasticsearch.yml

Disable allocation altogether

cluster.routing.allocation.disable_allocation: true

Avoid split-brain in the current 5 node cluster

discovery.zen.minimum_master_nodes: 3

3 Increased Discovery timeout

discovery.zen.ping.timeout: 100s

Specific Objective:
When a cluster restarts, try to force re-use of how the shards were
allocated before shutdown.

Attempt:

Tried to increase the discovery.zen.minimum_master_nodes to 5 in a 5 node
cluster with the idea that if a node could refuse to become operational
until all 5 nodes in the cluster were recognized.

Result:
Unfortunately, despite making this setting equal to the total number of
nodes in the cluster, I observed shard re-allocation at 4 of the 5 nodes
without waiting for the fifth node to come online. And, this is with
allocation disabled.

Would like an opinion whether what I'm trying to accomplish is even
possible to

As much as possible to force a restarted cluster to use existing shards
as already allocated
Start all at once rather than rolling node starts which contributes to
shard re-allocation.

TIA,
Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aadbb803-f78e-4ddf-a718-69d4a2792f12%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · February 7, 2014, 11:23pm

Shard allocation should never happen if disable_allocation is enabled.
Which version are you using? Are you doing a rolling restart or a full
cluster restart?

Two things that might help. First is to execute a flush before restarting.
I believe mismatched transaction states will label a shard as incorrect
during a restart. Also play around with the recovery settings [1]. Try
setting gateway.recover_after_nodes (disabled by default).

[1]

Cheers,

Ivan

On Fri, Feb 7, 2014 at 3:11 PM, Tony Su tonysu999@gmail.com wrote:

At first, I noticed what some have called "shard thrashing," ie during
startup shards are re-allocated as nodes come online.

Have implemented the following by either creating a new setting or
modifying existing settings in elasticsearch.yml

Disable allocation altogether

cluster.routing.allocation.disable_allocation: true

Avoid split-brain in the current 5 node cluster

discovery.zen.minimum_master_nodes: 3

3 Increased Discovery timeout

discovery.zen.ping.timeout: 100s

Specific Objective:
When a cluster restarts, try to force re-use of how the shards were
allocated before shutdown.

Attempt:

Tried to increase the discovery.zen.minimum_master_nodes to 5 in a 5
node cluster with the idea that if a node could refuse to become
operational until all 5 nodes in the cluster were recognized.

Result:
Unfortunately, despite making this setting equal to the total number of
nodes in the cluster, I observed shard re-allocation at 4 of the 5 nodes
without waiting for the fifth node to come online. And, this is with
allocation disabled.

Would like an opinion whether what I'm trying to accomplish is even
possible to

As much as possible to force a restarted cluster to use existing shards
as already allocated

Start all at once rather than rolling node starts which contributes to
shard re-allocation.

TIA,
Tony

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aadbb803-f78e-4ddf-a718-69d4a2792f12%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAVZM303CGVLQn5Ti0cntkFYJ7WPR_EL9LvcyMrCahRtA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 8, 2014, 12:29am

Hi Ivan,
Thx.

Yes, I have been doing a flush before every cluster shutdown now.
Running ES 1.0 RC1

I have been doing rolling restarts because I have been unable to start all
nodes nearly at once and get all nodes to join even after extending the
timeout as I described. But, as I'm theorizing I'm speculating that doing a
rolling restart is contributing to the shards being re-allocated because
nodes that contain shards for the index may not appear soon enough.

Maybe the entry I made in elasticsearch.yml exactly as I described isn't
correct? I derived it from an ES source that described sending the command
using curl but I thought better to enter directly in elasticsearch.yml

I'll take a look at your link, thx.

Tony

On Friday, February 7, 2014 3:23:24 PM UTC-8, Ivan Brusic wrote:

Shard allocation should never happen if disable_allocation is enabled.
Which version are you using? Are you doing a rolling restart or a full
cluster restart?

Two things that might help. First is to execute a flush before restarting.
I believe mismatched transaction states will label a shard as incorrect
during a restart. Also play around with the recovery settings [1]. Try
setting gateway.recover_after_nodes (disabled by default).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,

Ivan

On Fri, Feb 7, 2014 at 3:11 PM, Tony Su <tony...@gmail.com <javascript:>>wrote:

At first, I noticed what some have called "shard thrashing," ie during
startup shards are re-allocated as nodes come online.

Have implemented the following by either creating a new setting or
modifying existing settings in elasticsearch.yml

Disable allocation altogether

cluster.routing.allocation.disable_allocation: true

Avoid split-brain in the current 5 node cluster

discovery.zen.minimum_master_nodes: 3

3 Increased Discovery timeout

discovery.zen.ping.timeout: 100s

Specific Objective:
When a cluster restarts, try to force re-use of how the shards were
allocated before shutdown.

Attempt:

Tried to increase the discovery.zen.minimum_master_nodes to 5 in a 5
node cluster with the idea that if a node could refuse to become
operational until all 5 nodes in the cluster were recognized.

Result:
Unfortunately, despite making this setting equal to the total number of
nodes in the cluster, I observed shard re-allocation at 4 of the 5 nodes
without waiting for the fifth node to come online. And, this is with
allocation disabled.

Would like an opinion whether what I'm trying to accomplish is even
possible to

As much as possible to force a restarted cluster to use existing shards
as already allocated

Start all at once rather than rolling node starts which contributes to
shard re-allocation.

TIA,
Tony

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aadbb803-f78e-4ddf-a718-69d4a2792f12%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6f3385b-98d2-4d60-9c11-ccbc34cfa706%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 10, 2014, 9:29pm

I've verified that shards are re-allocating after a cluster restart (again,
I'm using 1.0 RC1).
To test this specifically, I loaded a small dataset (can take a very long
time to verify results on a large dataset).

Easy to verify.
In a 5 node cluster, load some apache data. (I loaded only a couple dozen
days)
Let the cluster run until all shards are allocated, es-head can be good for
this.
Flush and shutdown the cluster.
Bring up only one node and point es-head at it, should display all 5 shards
for each index residing on the lone active node.
Bring up one additional, then maybe even a second node, refreshing es-head
every 15 seconds or so. Shards are observed first replicating to the second
node, then when the third node is active the shards are again re-allocated
for balancing.

So, either the entry I made into elasticsearch.yml to disable shard
allocation is incorrect, or there is likely a bug.(Or, I might
fundamentally misunderstand what disabling shard re-allocation is supposed
to do).

Maybe I'll re-test on a 0.90 cluster to see if it behaves differently...

Tony

On Friday, February 7, 2014 4:29:57 PM UTC-8, Tony Su wrote:

Hi Ivan,
Thx.

Yes, I have been doing a flush before every cluster shutdown now.
Running ES 1.0 RC1

I have been doing rolling restarts because I have been unable to start all
nodes nearly at once and get all nodes to join even after extending the
timeout as I described. But, as I'm theorizing I'm speculating that doing a
rolling restart is contributing to the shards being re-allocated because
nodes that contain shards for the index may not appear soon enough.

Maybe the entry I made in elasticsearch.yml exactly as I described isn't
correct? I derived it from an ES source that described sending the command
using curl but I thought better to enter directly in elasticsearch.yml

I'll take a look at your link, thx.

Tony

On Friday, February 7, 2014 3:23:24 PM UTC-8, Ivan Brusic wrote:

Shard allocation should never happen if disable_allocation is enabled.
Which version are you using? Are you doing a rolling restart or a full
cluster restart?

Two things that might help. First is to execute a flush before
restarting. I believe mismatched transaction states will label a shard as
incorrect during a restart. Also play around with the recovery settings
[1]. Try setting gateway.recover_after_nodes (disabled by default).

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,

Ivan

On Fri, Feb 7, 2014 at 3:11 PM, Tony Su tony...@gmail.com wrote:

At first, I noticed what some have called "shard thrashing," ie during
startup shards are re-allocated as nodes come online.

Have implemented the following by either creating a new setting or
modifying existing settings in elasticsearch.yml

Disable allocation altogether

cluster.routing.allocation.disable_allocation: true

Avoid split-brain in the current 5 node cluster

discovery.zen.minimum_master_nodes: 3

3 Increased Discovery timeout

discovery.zen.ping.timeout: 100s

Specific Objective:
When a cluster restarts, try to force re-use of how the shards were
allocated before shutdown.

Attempt:

Tried to increase the discovery.zen.minimum_master_nodes to 5 in a 5
node cluster with the idea that if a node could refuse to become
operational until all 5 nodes in the cluster were recognized.

Result:
Unfortunately, despite making this setting equal to the total number of
nodes in the cluster, I observed shard re-allocation at 4 of the 5 nodes
without waiting for the fifth node to come online. And, this is with
allocation disabled.

Would like an opinion whether what I'm trying to accomplish is even
possible to

As much as possible to force a restarted cluster to use existing
shards as already allocated

Start all at once rather than rolling node starts which contributes to
shard re-allocation.

TIA,
Tony

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/aadbb803-f78e-4ddf-a718-69d4a2792f12%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/60e41b4b-dc7e-4768-83aa-b095e50b2749%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Binh_Ly · February 10, 2014, 11:08pm

Tony,

Not sure what is the cause of your problem, but you might want to also
check out this setting in the YML file:

gateway.recover_after_nodes

More details about this particular setting on this video:

http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5efc8b07-65b0-4616-bf3a-ee099d01e7bf%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 11, 2014, 5:15pm

Update:
Whereas my previous tries to optimize for recovery failed miserably, the "gateway.recover_after_nodes" setting in elasticsearch.yml worked... To a point.

I noticed

No ES node was responsive at all after nodes were brought online until the quorum was met.
It can take a long time for the ES cluster to agree to a quorum, on my tiny 5 node cluster, it took approx 10 minutes after the nodes were brought online until one started responding to es-head. I poked all the nodes up to that moment so it does seem like the cluster starts up all at once.
But, at least in this early case, shard re-allocation and thrashing is not avoided. Before shutting down I didn't carefully retain the shard mapping across nodes but I did notice that once indexing settled down, for most indexes there were as expected 10 shards evenly distributed across the nodes (2/node because for every primary shard there is a replica). On restart, I observed high concentrations of shards on certain nodes and fewer on others, not an even distribution.
For approx 9GB of indexed metadata (800mb raw data), it has taken a little over 40 minutes for the cluster to recover to "green" state.

So, mixed and some disappointing results. Since shard re-allocation seems to happen although perhaps less when the gateway_recover_after_nodes setting is enabled and configured, I'm still hoping for something to decrease recovery time further.

Perhaps recovery isn't being done as efficiently as it might.

My impression is that shard content is being evaluated in its full form. If it is, I imagine shard content and its integrity can be evaluated far faster and better by hash.
If hashes are used, I would suggest that they be saved as part of the "flush" command or a separate "flush, snapshot and shutdown ES" command. When a cluster restarts, perhaps the hash table can be used to quickly "snapshot" the existing node and "local data on disk" layout before commencing recovery and moving around shards.
Speaking of which, maybe sometime it could be useful to detail what ES is doing on startup and/or recovery so that we can tinker more intelligently.

Thx,
Tony

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3302e002-cd0d-433a-9f7f-9f6d92c095a6%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Binh_Ly · February 11, 2014, 6:38pm

Tony,

What you are seeing with the shard recovery is normal - but doesn't mean it
couldn't use more improvement in the future. For now you can throttle the
recovery using a combination of settings (but cannot 100% avoid it).

Just FYI, there is a reason hashing cannot be done (for now) and this is
discussed in this thread (look where Zachary describes the segment
divergence scenario to understand more):

https://groups.google.com/forum/#!topic/elasticsearch/9uF-a5vqfkQ

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f4989c6-4042-44e3-a0ed-25546d6cfa19%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tony_Su · February 12, 2014, 12:30am

Cool.
Thx all.

Tony
On Tuesday, February 11, 2014 10:38:18 AM UTC-8, Binh Ly wrote:

Tony,

What you are seeing with the shard recovery is normal - but doesn't mean
it couldn't use more improvement in the future. For now you can throttle
the recovery using a combination of settings (but cannot 100% avoid it).

Just FYI, there is a reason hashing cannot be done (for now) and this is
discussed in this thread (look where Zachary describes the segment
divergence scenario to understand more):

Redirecting to Google Groups

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e1af0e57-9f2d-4089-afae-8b87b97579f0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Cluster.routing.allocation.enable behavior (sticky shard allocation not working as expected) Elasticsearch	9	1164	July 6, 2017
Restarting an active node without needing to recover all data remotely Elasticsearch	13	5164	July 6, 2017
Ability to stop and start a cluster without shard movement part2 Elasticsearch	5	971	September 11, 2018
Reuse of replicas on cluster restart Elasticsearch	6	404	July 6, 2017
Data lost after full cluster restart Elasticsearch	8	2723	July 6, 2017

Trying to optimize configuration for better cluster restart/recovery

Related topics