Hi there. We have a cluster running ES v2.3.1 on N physical servers. After a node failure, the cluster keeps working on the remaining N-1 machines and the cluster status turns RED because some indices with replica 0 lost primary shards (so far so good).
Weird behavior #1: we create a new index with the cluster in RED status and we see that ALL shards are allocated to only one node out of the N-1 working nodes. Why?
Weird behavior #2: we delete an index but after a while such an index reappears. Why?
Question: what are the limitations that we have to take into account when writing/deleting indices to/from a cluster in RED state?
Thanks a lot.
Hello,
ES will not allow you to index documents (or update or any write operation) into an index that is missing a primary shard (i.e. an index that causes the cluster to go RED). However, ES will allow you to index documents into indices that are not RED (i.e. indices that have all primary shards active).
You can create or delete indices in a RED cluster. The only requirement to creating or deleting indices is a live master node.
Weird behavior #1: we create a new index with the cluster in RED status and we see that ALL shards are allocated to only one node out of the N-1 working nodes. Why?
It really depends on how many nodes you have and where previous indices' shards are allocated to before you created the index. In general, ES will try to balance shard allocation evenly, but any number of other factors can affect the decision of where to allocate a shard, for example through explicit shard allocation filtering (see Cluster-level shard allocation filtering | Elasticsearch Guide [8.11] | Elastic).
Weird behavior #2: we delete an index but after a while such an index reappears. Why?
This is likely due to dangling indices. In order to inadvertently prevent indices from being wiped out when a brand new master takes over (e.g. during a cluster restart), ES tries to instead import any indices found on data nodes that don't exist in the cluster state. The problem with this in 2.x (a known bug) was that if a node that was part of the cluster was offline when the index deletion happened, when the node comes back online, it missed the index deletion event, so it does not know that the index was deleted. Instead, it tries to import the index contents it has on disk as a dangling index. If this is indeed the problem you are seeing, then you should see something like this in the log files:
dangling index, exists on local file system, but not in cluster metadata, auto import to cluster state
This is solved in 5.x by introducing index tombstones to explicitly denote index deletions, so they don't inadvertently reappear in clusters.
Hi Ali. Thank you very much for your detailed reply! It is very useful to us.
About point #1: we left all the default values for allocation filtering hence the all-in-one-node allocation behavior still sounds a bit strange.
About point #2: yes, we got this message dangling index, exists on local file system, but not in cluster metadata, auto import to cluster state
in our logs
Thanks a lot.
Cheers