Loss of Elasticsearch Replicas/Shards After Node Failures

Hello,

I'm facing an issue with my Elasticsearch cluster and I'm looking for some guidance or suggestions on what might be happening.

I have an Elasticsearch cluster running version 6.5.1 with JVM 11.0.11. Recently, I've noticed that some of my indices, which were originally configured with 2 shards and 2 replicas, now have 1 shard and 1 replica.

This occurred after an incident where some nodes in my cluster became unreachable but later returned to normal operation. I suspect that the loss of shards and replicas may have occurred during this period.

My questions are as follows:

  1. How is it possible that shards and replicas were lost during the process of node recovery?
  2. Is there a way to prevent or monitor these reallocations during such incidents?
  3. How can I restore the original configuration of 2 shards and 2 replicas for these indices?

I appreciate any help or insights on this issue. I'm looking to better understand how Elasticsearch manages shard and replica reallocations during cluster failures and how to ensure consistency in index configurations.

Thank you!

This is a very old version that has been EOL a long time. One common issue with clusters running version 6.x and earlier is misconfiguration, which can cause issues like you are mentioning. The first thing I would check is therefore that minimum_master_nodes is set correctly, which depends on the number of master eligible nodes in your cluster. Please share information about the topology of your cluster and what this setting is set to.

Hello @Christian_Dahlqvist, thank you for answer!

I have 2 Master nodes and 8 Data nodes, as specified in the full settings below:

MASTERS NODES

node.name: ${HOSTNAME}
node.data: false
node.master: true
node.ingest: false

cluster.name: cluster-bi

xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.bind_host: "_ens3:ipv4_"
network.publish_host: "_ens4:ipv4_"
network.host: [ "_ens4:ipv4_", "_ens3:ipv4_", "_lo:ipv4_" ]
transport.host: _ens4:ipv4_

http.port: 9200

cluster.initial_master_nodes: ["192.168.0.101", "192.168.0.102"]

discovery.zen.ping.unicast.hosts: [ "192.168.0.101", "192.168.0.102", "192.168.0.201", "192.168.0.202", "192.168.0.203", "192.168.0.204", "192.168.0.205", "192.168.0.206", "192.168.0.207", "192.168.0.208"]
discovery.zen.minimum_master_nodes: 1

DATA NODES

node.name: ${HOSTNAME}
node.data: true
node.master: false
node.ingest: true

cluster.name: cluster-bi
#cluster.remote.connect: false


xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
path.data: /mnt/disk/elasticsearch
path.logs: /var/log/elasticsearch

network.bind_host: "_ens3:ipv4_"
network.publish_host: "_ens4:ipv4_"
network.host: [ "_ens4:ipv4_", "_ens3:ipv4_", "_lo:ipv4_" ]
transport.host: _ens4:ipv4_

http.port: 9200

cluster.initial_master_nodes: ["192.168.0.101","192.168.0.102"]

discovery.zen.ping.unicast.hosts: [ "192.168.0.101", "192.168.0.102", "192.168.0.201", "192.168.0.202", "192.168.0.203", "192.168.0.204", "192.168.0.205", "192.168.0.206", "192.168.0.207", "192.168.0.208"]
discovery.zen.minimum_master_nodes: 1

With 2 master eligible nodes discovery.zen.minimum_master_nodes should be set to 2. Your cluster is therefore incorrectly configured, which can lead to split-brain scenarios and a lot of issues, including inconsistencies and data loss.

You should also always look to have 3 master eligible nodes in the cluster as that would allow you to continue operating even if one of the master eligible nodes go down. If you had 3 master eligible nodes discovery.zen.minimum_master_nodes should still be set to 2.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.