The cluster never changes the assigned master node
Elasticsearch version 6.8.
I have a cluster with 6 data nodes, 3 ingest nodes, and 5 masters (masters only). Over the last few days, the master nodes have been cascading down. I see that master node 5 is always assigned as a master and is always the first to go down, taking the other master nodes with it.
At first, I thought it was probably a memory issue, but the problem is not resolved when I increase the HEP from 8 to 13 G.
This is my cluster configuration:
name,heap.percent,heap.max,ram.percent,ram.max,master,cpu,load_1m,load_5m,load_15m
CGSS-CLUSTER01-DATA-1 42 13.9gb 98 15.5gb - 0 0.07 0.02 0.00 6.8.23
CGSS-CLUSTER01-DATA-2 45 13.9gb 99 15.5gb - 0 0.19 0.07 0.03 6.8.23
CGSS-CLUSTER01-DATA-3 28 12.9gb 97 15.5gb - 0 0.02 0.01 0.00 6.8.23
CGSS-CLUSTER01-DATA-4 38 12.9gb 99 15.5gb - 1 0.00 0.04 0.07 6.8.23
CGSS-CLUSTER01-DATA-5 14 12.9gb 99 15.5gb - 0 0.33 0.24 0.26 6.8.23
CGSS-CLUSTER01-DATA-6 37 12.9gb 99 15.5gb - 0 0.13 0.10 0.06 6.8.23
CGSS-CLUSTER01-INGEST-01 62 7.9gb 99 9.6gb - 0 0.00 0.00 0.00 6.8.23
CGSS-CLUSTER01-INGEST-02 66 5.9gb 98 7.7gb - 0 0.00 0.01 0.00 6.8.23
CGSS-CLUSTER01-INGEST-03 47 7.9gb 95 9.6gb - 1 0.16 0.21 0.15 6.8.23
CGSS-CLUSTER01-MASTER-01 12 12.9gb 99 14.5gb - 0 1.90 1.80 1.59 6.8.23
CGSS-CLUSTER01-MASTER-02 13 12.9gb 99 14.5gb - 0 0.00 0.00 0.07 6.8.23
CGSS-CLUSTER01-MASTER-03 15 12.9gb 99 14.6gb - 0 0.00 0.00 0.00 6.8.23
CGSS-CLUSTER01-MASTER-04 13 12.9gb 99 14.6gb - 0 0.00 0.00 0.00 6.8.23
CGSS-CLUSTER01-MASTER-05 30 12.9gb 99 14.6gb * 0 0.00 0.00 0.00 6.8.23
=======================================\n
This is the elasticsearch.yml file for the master nodes:
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: MON-CGSSESCLUSTER01-S
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: CGSS-CLUSTER01-MASTER-01
#
# Add custom attributes to the node:
#
node.attr.zone: zone1
#
node.master: true
node.data: false
node.ingest: false
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /cgss_systems/data
#
# Path to log files:
#
path.logs: /cgss_systems/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 10.0.3.59
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["10.0.3.57", "10.0.3.58", "10.0.3.59", "10.0.3.60", "10.0.3.61", "10.0.3.62", "10.0.3.63", "10.0.3.64", "10.0.3.65", "10.0.3.66", "10.0.3.67","10.0.3.69","10.0.3.71","10.0.3.75"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3
# Correcciones para evitar las caidas de nodos master.
discovery.zen.fd.ping_interval: 5s # Default: 1s
discovery.zen.fd.ping_timeout: 60s # Default: 30s
discovery.zen.fd.ping_retries: 6 # Default: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.recover_after_nodes: 6
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true
xpack.monitoring.enabled: false
xpack.ml.enabled: false
xpack.graph.enabled: false
xpack.reporting.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false