What is the best practice for stopping and starting a running cluster?
My setup:
Elasticsearch 90.6
2 Master only nodes - each on their own box
52 Data only nodes - spread across 6 boxes with 12,12,12,2,7,7 nodes on
each
Each node is running under supervision (supervisord) so that they will be
restarted if they crash on their own.
Stop/Start routine:
turn off rivers (no more ingest)
turn off shard allocation
"cluster.routing.allocation.disable_allocation":true
"cluster.routing.allocation.disable_replica_allocation":true
stop and restart nodes on each box using supervisorctl
$>supervisorctl stop elasticsearch-1
$>supervisorctl start elasticsearch-1
wait for "initializing_shards" count to reach 0
turn on shard allocation
wait for "unassigned_shards" count to reach 0
turn on rivers
Result:
We almost always end up with one or a combination of several isssue :
nodes pegged on heap and un responsive (cluster cant communicate with
them, they are not hittable via api)
nodes stuck initializing shards forever
nodes stuck allocating shards forever
"ghost" nodes; a second copy of a node in the cluster state (NOT process
actually running) with that same name, different id. This actually doesnt
affect es performance much but it makes es-head and other tools break due
duplicate node/key name.
Some times, repeated opening and closing and index will get its shards to
allocate and initialize. Sometimes not.
In the latest versions (1.0.0.RC1) ES shutdown chooses a strategy in which
order nodes are closed, it makes things less error prone to shut down
current master node at last.
Startup:
with shell execute a for loop over ssh command and start your favorite
wrapper script on remote nodes in parallel. Master-eligible nodes first,
non-master-eligible nodes last.
In the latest versions (1.0.0.RC1) ES shutdown chooses a strategy in which
order nodes are closed, it makes things less error prone to shut down
current master node at last.
Startup:
with shell execute a for loop over ssh command and start your favorite
wrapper script on remote nodes in parallel. Master-eligible nodes first,
non-master-eligible nodes last.
I believe the heart of this issue is JVM memory usage.
So does it make sense to delete warmers before shutdown (so they dont try
to warm during initial node recovery)?
Does it make sense to lower my (currently set at 8):
cluster.routing.allocation.node_initial_primaries_recoveries
cluster.routing.allocation.node_concurrent_recoveries
to limit the amount of work any one node will do at once?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.