Quickly restarting a node

yann-soubeyrand · March 14, 2019, 3:16pm

Hi,

Trying to restart a node in our cluster as quickly as possible, I use the following procedure:

I disable shard allocation except for new primaries:

curl -s -u "$username:$password" -X 'PUT' -H 'Content-Type: application/json' -d '{ "transient": { "cluster.routing.allocation.enable": "new_primaries" } }' "$cluster_url/_cluster/settings"

I perform a synced flush:

curl -s -u "$username:$password" -X 'POST' "$cluster_url/_flush/synced"

I restart the node.
When the node joins the cluster, I re-enable shard allocation:

curl -s -u "$username:$password" -X 'PUT' -H 'Content-Type: application/json' -d '{ "transient": { "cluster.routing.allocation.enable": null } }' "$cluster_url/_cluster/settings"

My understanding is that cluster state should rapidly transition from yellow to green thanks to the synced flush. However, shard allocation hits throttling and is therefore slow as hell.

Did I miss something?

Our cluster currently contains too many shards and we are working toward reducing it. Will it solve our problem or is there other factors influencing node restart duration?

DavidTurner · March 14, 2019, 3:41pm

Did the response to the synced flush indicate that it was completely successful? Did you stop indexing while the node was offline? If the answer to either question is no then it's possible that the synced flush marker isn't there on every shard (either it wasn't put in place, or it was put there and then removed) and this results in a slower recovery.

Which version are you using?

yann-soubeyrand · March 14, 2019, 4:19pm

The synced flush indicates that almost every shards are successful: only 22 out of 22472 failed (we really have too many shards). Indexing wasn't stopped during node restart but only a small number of shards should be touched (I estimate the maximum number to be 642).

Having 6 data nodes, 3745 (22472 / 6) shards are unassigned after a node restart and I expect maximum 107 (642 / 6) shards to be slow recovering and the remaining shards to recover very quickly (as their flush marker shouldn't have changed).

For a shard which has been touched during node restart (resulting in its flush marker changing), is its recovery duration function of its size?

yann-soubeyrand · March 14, 2019, 4:33pm

I forgot to mention that we are using version 6.6.1.

DavidTurner · March 14, 2019, 4:38pm

It depends. In some recoveries Elasticsearch has to make a brand-new copy of the shard. It will re-use any segments that it can, but often there aren't many of these. This was the case for all recoveries in versions before 6.0, and is still the case in more recent versions if there's been too many changes (>512MB of translog), or the node has been offline for too long (>12h), or the new copy is assigned to a different node from the node that holds the previous, stale, copy of the shard.

Is that different from what you're seeing? Are you seeing shards recover that you weren't expecting to need recovery?

yann-soubeyrand · March 14, 2019, 4:56pm

I'm not sure of my interpretation of the /_recovery informations here, but I see almost all our indices there.

system · April 11, 2019, 4:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard allocation on restarted node takes too long Elasticsearch	5	3356	July 5, 2017
Restart node after 15 mins Elasticsearch	1	80	April 22, 2024
Restarting many nodes Elasticsearch	3	278	July 19, 2018
How do I gracefully and quickly make minor updates to my cluster config? Elasticsearch	4	535	May 24, 2017
Restarting a cluster node Elasticsearch	2	270	November 16, 2020

Quickly restarting a node

Related topics