Loss of Elasticsearch Replicas/Shards After Node Failures

Jeankininho · November 7, 2023, 10:13pm

Hello,

I'm facing an issue with my Elasticsearch cluster and I'm looking for some guidance or suggestions on what might be happening.

I have an Elasticsearch cluster running version 6.5.1 with JVM 11.0.11. Recently, I've noticed that some of my indices, which were originally configured with 2 shards and 2 replicas, now have 1 shard and 1 replica.

This occurred after an incident where some nodes in my cluster became unreachable but later returned to normal operation. I suspect that the loss of shards and replicas may have occurred during this period.

My questions are as follows:

How is it possible that shards and replicas were lost during the process of node recovery?
Is there a way to prevent or monitor these reallocations during such incidents?
How can I restore the original configuration of 2 shards and 2 replicas for these indices?

I appreciate any help or insights on this issue. I'm looking to better understand how Elasticsearch manages shard and replica reallocations during cluster failures and how to ensure consistency in index configurations.

Thank you!

Christian_Dahlqvist · November 8, 2023, 6:39am

This is a very old version that has been EOL a long time. One common issue with clusters running version 6.x and earlier is misconfiguration, which can cause issues like you are mentioning. The first thing I would check is therefore that minimum_master_nodes is set correctly, which depends on the number of master eligible nodes in your cluster. Please share information about the topology of your cluster and what this setting is set to.

Jeankininho · November 8, 2023, 5:56pm

Hello @Christian_Dahlqvist, thank you for answer!

I have 2 Master nodes and 8 Data nodes, as specified in the full settings below:

MASTERS NODES

node.name: ${HOSTNAME}
node.data: false
node.master: true
node.ingest: false

cluster.name: cluster-bi

xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.bind_host: "_ens3:ipv4_"
network.publish_host: "_ens4:ipv4_"
network.host: [ "_ens4:ipv4_", "_ens3:ipv4_", "_lo:ipv4_" ]
transport.host: _ens4:ipv4_

http.port: 9200

cluster.initial_master_nodes: ["192.168.0.101", "192.168.0.102"]

discovery.zen.ping.unicast.hosts: [ "192.168.0.101", "192.168.0.102", "192.168.0.201", "192.168.0.202", "192.168.0.203", "192.168.0.204", "192.168.0.205", "192.168.0.206", "192.168.0.207", "192.168.0.208"]
discovery.zen.minimum_master_nodes: 1

DATA NODES

node.name: ${HOSTNAME}
node.data: true
node.master: false
node.ingest: true

cluster.name: cluster-bi
#cluster.remote.connect: false


xpack.security.enabled: true

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
path.data: /mnt/disk/elasticsearch
path.logs: /var/log/elasticsearch

network.bind_host: "_ens3:ipv4_"
network.publish_host: "_ens4:ipv4_"
network.host: [ "_ens4:ipv4_", "_ens3:ipv4_", "_lo:ipv4_" ]
transport.host: _ens4:ipv4_

http.port: 9200

cluster.initial_master_nodes: ["192.168.0.101","192.168.0.102"]

discovery.zen.ping.unicast.hosts: [ "192.168.0.101", "192.168.0.102", "192.168.0.201", "192.168.0.202", "192.168.0.203", "192.168.0.204", "192.168.0.205", "192.168.0.206", "192.168.0.207", "192.168.0.208"]
discovery.zen.minimum_master_nodes: 1

Christian_Dahlqvist · November 8, 2023, 6:00pm

With 2 master eligible nodes discovery.zen.minimum_master_nodes should be set to 2. Your cluster is therefore incorrectly configured, which can lead to split-brain scenarios and a lot of issues, including inconsistencies and data loss.

You should also always look to have 3 master eligible nodes in the cluster as that would allow you to continue operating even if one of the master eligible nodes go down. If you had 3 master eligible nodes discovery.zen.minimum_master_nodes should still be set to 2.

system · December 6, 2023, 6:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Split Brain Scenario Elasticsearch	7	614	June 10, 2019
Shard reallocation stops Elasticsearch	11	4586	November 7, 2017
Shard recovery with only one node in the cluster Elasticsearch	3	648	July 6, 2017
total_shards_per_node and node failure Elasticsearch	4	664	December 5, 2012
Multiple indexes break elasticsearch (2.3.1) cluster replication? Elasticsearch	5	406	January 18, 2019

Loss of Elasticsearch Replicas/Shards After Node Failures

Related topics