High availability of system indices: how-to mitigate system index failures

I notice that system indices in Elasticsearch have "index.number_of_replicas": "0" and this cannot be updated. Is there a reason why this cannot be set to 1 (or more) and how safe is it to have no replicas for system indices from a high availability point of view? How can we mitigate system index failures?

They should be set as auto-expanding using the index.auto_expand_replicas setting in the relevant template.

How many data nodes do you have in your cluster?

1 Like

We are investigating two types of clusters: 2 clusters with 3 nodes and 1 having 6 nodes. The 2 clusters having 3 nodes are in separate data centers and will be used as multi cluster HA implementation (Active-Passive).
The single 6 node cluster has 3 nodes in each datacenter and should act as Active-Active HA solution.
I know how to deal with zones for replicas, cluster replication etc, but I am not sure how-to HA the system indices in each solution.
Mayve allsystem indeices are secretly available on all (data) nodes? That would be nice!

OK, I close this topic. Apparently when you start with multi node clusters, all system indices do have replicas. The settings show the following attributes:
"index.auto_expand_replicas": "0-1",
"index.number_of_replicas": "1"

It seems that system indices high availability is taken care of automatically. Great!

Note that (as per the docs) this doesn't work as a HA solution:

You cannot configure a two-zone cluster so that it can tolerate the loss of either zone because this is theoretically impossible.