How to simulate node in a red state?

Hello,

In our cluster, we have 3 nodes, each running in a virtual machine. Each
shard has 2 replicas, so that each node contains all data.

We are currently reviewing our production processes. Especially, we are
thinking about how we should react if a node gets into a red state.

At the moment, we assume that the fastest and safest way to recover is to
have a backup VM with an empty node ready. If a node fails, we would simply
shut down the red node and start the backup VM, which then should join the
cluster and take over the unassigned shards.

We’d be glad to know if this is a procedure that other ES users are
following as well.

But above all, we haven’t found a method how we can test and train this
procedure. Are there ways to deliberately bring a node into a red state? In
our test environments, we regularly see red nodes, but we’ve never found
out how they got into that state.

Thanks for any hints,
Andreas

--

Hi Andreas,

Cluster and indices can go into red status and when an index is in a
red status then it means that one or more indices have one or more
missing shards (primary shard and all replicas for that primary
shard). A cluster can also get in a red status if the 'cluster'
blocks. For example if the minimum_master_nodes is configured to 2 and
one node loses connection to the other 2 nodes, the node that loses
connection will have a cluster red status.

If you want to fix the red cluster status, you need to bring up the
node that you lost or a new node that uses the data directory from the
lost node. Bringing up an empty node won't resolve the cluster red
status. The way to minimize the red cluster status is to have enough
nodes running with actual shards and increase the number of replicas
for an index.

If you want to deliberately put a cluster in a red state you need to
shutdown enough nodes that for an index and particular shard both the
primary and all the replicas aren't available. You'll need to quickly
shutdown the nodes by shutting down all nodes at once. ES works to
actively resolve cases where a copy of a shard is not allocated by
allocating it to the rest of the cluster.

Each shard has two replicas, which means in your case all data is on
all nodes. Going into a red status in your current setup means that
all nodes have to go down. Also how many shard do you have per index?

Martijn

On 9 November 2012 11:27, Andreas W andi.weibel@gmail.com wrote:

Hello,

In our cluster, we have 3 nodes, each running in a virtual machine. Each
shard has 2 replicas, so that each node contains all data.

We are currently reviewing our production processes. Especially, we are
thinking about how we should react if a node gets into a red state.

At the moment, we assume that the fastest and safest way to recover is to
have a backup VM with an empty node ready. If a node fails, we would simply
shut down the red node and start the backup VM, which then should join the
cluster and take over the unassigned shards.

We’d be glad to know if this is a procedure that other ES users are
following as well.

But above all, we haven’t found a method how we can test and train this
procedure. Are there ways to deliberately bring a node into a red state? In
our test environments, we regularly see red nodes, but we’ve never found out
how they got into that state.

Thanks for any hints,
Andreas

--

--
Met vriendelijke groet,

Martijn van Groningen

--