Cluster in Red status: what about write & delete operations?

steccami · August 23, 2016, 12:48pm

Hi there. We have a cluster running ES v2.3.1 on N physical servers. After a node failure, the cluster keeps working on the remaining N-1 machines and the cluster status turns RED because some indices with replica 0 lost primary shards (so far so good).
Weird behavior #1: we create a new index with the cluster in RED status and we see that ALL shards are allocated to only one node out of the N-1 working nodes. Why?
Weird behavior #2: we delete an index but after a while such an index reappears. Why?
Question: what are the limitations that we have to take into account when writing/deleting indices to/from a cluster in RED state?
Thanks a lot.

abeyad · August 23, 2016, 7:24pm

Hello,

ES will not allow you to index documents (or update or any write operation) into an index that is missing a primary shard (i.e. an index that causes the cluster to go RED). However, ES will allow you to index documents into indices that are not RED (i.e. indices that have all primary shards active).

You can create or delete indices in a RED cluster. The only requirement to creating or deleting indices is a live master node.

Weird behavior #1: we create a new index with the cluster in RED status and we see that ALL shards are allocated to only one node out of the N-1 working nodes. Why?

It really depends on how many nodes you have and where previous indices' shards are allocated to before you created the index. In general, ES will try to balance shard allocation evenly, but any number of other factors can affect the decision of where to allocate a shard, for example through explicit shard allocation filtering (see Cluster-level shard allocation filtering | Elasticsearch Guide [8.11] | Elastic).

Weird behavior #2: we delete an index but after a while such an index reappears. Why?

This is likely due to dangling indices. In order to inadvertently prevent indices from being wiped out when a brand new master takes over (e.g. during a cluster restart), ES tries to instead import any indices found on data nodes that don't exist in the cluster state. The problem with this in 2.x (a known bug) was that if a node that was part of the cluster was offline when the index deletion happened, when the node comes back online, it missed the index deletion event, so it does not know that the index was deleted. Instead, it tries to import the index contents it has on disk as a dangling index. If this is indeed the problem you are seeing, then you should see something like this in the log files:

dangling index, exists on local file system, but not in cluster metadata, auto import to cluster state

This is solved in 5.x by introducing index tombstones to explicitly denote index deletions, so they don't inadvertently reappear in clusters.

steccami · August 24, 2016, 8:12am

Hi Ali. Thank you very much for your detailed reply! It is very useful to us.
About point #1: we left all the default values for allocation filtering hence the all-in-one-node allocation behavior still sounds a bit strange.
About point #2: yes, we got this message dangling index, exists on local file system, but not in cluster metadata, auto import to cluster state in our logs
Thanks a lot.
Cheers

Topic		Replies	Views
Lost shards and cluster state stays red Elasticsearch	3	5259	July 6, 2017
Cluster status RED for one index Elasticsearch	2	2019	July 5, 2017
Running a cluster without replicas Elasticsearch	8	3883	May 25, 2018
Elasticsearch red status Elasticsearch	6	479	July 6, 2017
First time cluster installation- cluster status red Elasticsearch	4	639	November 8, 2017

Cluster in Red status: what about write & delete operations?

Related topics