The cluster never changes the assigned master node

The cluster never changes the assigned master node

Elasticsearch version 6.8.
I have a cluster with 6 data nodes, 3 ingest nodes, and 5 masters (masters only). Over the last few days, the master nodes have been cascading down. I see that master node 5 is always assigned as a master and is always the first to go down, taking the other master nodes with it.
At first, I thought it was probably a memory issue, but the problem is not resolved when I increase the HEP from 8 to 13 G.
This is my cluster configuration:

name,heap.percent,heap.max,ram.percent,ram.max,master,cpu,load_1m,load_5m,load_15m
CGSS-CLUSTER01-DATA-1    42 13.9gb 98 15.5gb - 0 0.07 0.02 0.00 6.8.23
CGSS-CLUSTER01-DATA-2    45 13.9gb 99 15.5gb - 0 0.19 0.07 0.03 6.8.23
CGSS-CLUSTER01-DATA-3    28 12.9gb 97 15.5gb - 0 0.02 0.01 0.00 6.8.23
CGSS-CLUSTER01-DATA-4    38 12.9gb 99 15.5gb - 1 0.00 0.04 0.07 6.8.23
CGSS-CLUSTER01-DATA-5    14 12.9gb 99 15.5gb - 0 0.33 0.24 0.26 6.8.23
CGSS-CLUSTER01-DATA-6    37 12.9gb 99 15.5gb - 0 0.13 0.10 0.06 6.8.23
CGSS-CLUSTER01-INGEST-01 62  7.9gb 99  9.6gb - 0 0.00 0.00 0.00 6.8.23
CGSS-CLUSTER01-INGEST-02 66  5.9gb 98  7.7gb - 0 0.00 0.01 0.00 6.8.23
CGSS-CLUSTER01-INGEST-03 47  7.9gb 95  9.6gb - 1 0.16 0.21 0.15 6.8.23
CGSS-CLUSTER01-MASTER-01 12 12.9gb 99 14.5gb - 0 1.90 1.80 1.59 6.8.23
CGSS-CLUSTER01-MASTER-02 13 12.9gb 99 14.5gb - 0 0.00 0.00 0.07 6.8.23
CGSS-CLUSTER01-MASTER-03 15 12.9gb 99 14.6gb - 0 0.00 0.00 0.00 6.8.23
CGSS-CLUSTER01-MASTER-04 13 12.9gb 99 14.6gb - 0 0.00 0.00 0.00 6.8.23
CGSS-CLUSTER01-MASTER-05 30 12.9gb 99 14.6gb * 0 0.00 0.00 0.00 6.8.23
=======================================\n

This is the elasticsearch.yml file for the master nodes:

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: MON-CGSSESCLUSTER01-S
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: CGSS-CLUSTER01-MASTER-01
#
# Add custom attributes to the node:
#
node.attr.zone: zone1
#
node.master: true
node.data: false
node.ingest: false


#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /cgss_systems/data
#
# Path to log files:
#
path.logs: /cgss_systems/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 10.0.3.59
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts:  ["10.0.3.57", "10.0.3.58", "10.0.3.59", "10.0.3.60", "10.0.3.61", "10.0.3.62", "10.0.3.63", "10.0.3.64", "10.0.3.65", "10.0.3.66", "10.0.3.67","10.0.3.69","10.0.3.71","10.0.3.75"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 3

# Correcciones para evitar las caidas de nodos master.
discovery.zen.fd.ping_interval: 5s       # Default: 1s 
discovery.zen.fd.ping_timeout: 60s        # Default: 30s
discovery.zen.fd.ping_retries: 6          # Default: 3 

#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.recover_after_nodes: 6 
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true

xpack.monitoring.enabled: false
xpack.ml.enabled: false
xpack.graph.enabled: false
xpack.reporting.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false

As a matter of urgency, please upgrade. We are already 3 major versions ahead with 9.0.3.
Also 5 master only nodes are not making a lot of sense IMO. Why this architecture?
3 master eligible nodes is enough.

Hi David,

I know new versions have already been released, but the migration process isn't straightforward in the environment I'm currently in.
Five nodes make sense in this environment because I have two different locations in an on-premises deployment. If I lose one location (for network, power, or other reasons), I always keep the other half of the cluster online.

This only works if you lose the half with the minority of master nodes, if you lose the half where the majority of masters are located your cluster will be down.

So this would only protect you on a specific kind of failure.

Another thing, it seems that you have all your nodes on discovery.zen.ping.unicast.hosts , but you should have only the master eligible nodes on this setting.

Hi Leandro, thanks for the information about discovery.zen. Regarding the other issue, if I lose my primary location, I just need to start a virtual machine in the other location, and both sites are constantly monitored. Additionally, the cluster is configured with a delay for reassigning shards.

What is the rationale behind these changes?

If you have 2 locations I do not see how 3 dedicated master nodes would behave any different.

Although this might work it could lead to data loss. Also note that this type of arrangement most likely will not work in version 7 and upwards as resiliency has been improved and there are more stringent checks in place.

1 Like

Hi Christian, the discovery configuration is related to possible network issues. I found a timeout error in the log when searching for the master node. It's a test, but I'm pretty sure I don't have any network issues... i just grasping at straws

Regarding the five master nodes, could they be part of the problem?

I do not think it is the problem as it would not cause nodes to fall over.

Why do the dedicated master nodes have so much heap? Dedicated master nodes should not hold data or serve requests so should only require a relatively small heap of e.g. 2GB.

What is the full output of the cluster stats API?

I reduced the heap to 8 GB. I have a lot of information from different sources and I made some mistakes early on in the project and am now trying to fix them, but I need time. The result is:

ebaum-OptiPlex-7060:~/Development/elastics-scripts/bash-scripts$ curl -XGET  10.0.3.71:9200/_cluster/stats?pretty
{
  "_nodes" : {
    "total" : 12,
    "successful" : 12,
    "failed" : 0
  },
  "cluster_name" : "MON-CGSSESCLUSTER01-S",
  "cluster_uuid" : "jVuNT4WqTKG3QZSRg_RARQ",
  "timestamp" : 1751565986870,
  "status" : "yellow",
  "indices" : {
    "count" : 455,
    "shards" : {
      "total" : 951,
      "primaries" : 951,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 6,
          "avg" : 2.0901098901098902
        },
        "primaries" : {
          "min" : 1,
          "max" : 6,
          "avg" : 2.0901098901098902
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 666249428,
      "deleted" : 6833814
    },
    "store" : {
      "size_in_bytes" : 211550181992
    },
    "fielddata" : {
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 0,
      "total_count" : 18134,
      "hit_count" : 0,
      "miss_count" : 18134,
      "cache_size" : 0,
      "cache_count" : 0,
      "evictions" : 0
    },
    "completion" : {
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 6932,
      "memory_in_bytes" : 510120074,
      "terms_memory_in_bytes" : 308515685,
      "stored_fields_memory_in_bytes" : 90701264,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 6136384,
      "points_memory_in_bytes" : 95881245,
      "doc_values_memory_in_bytes" : 8885496,
      "index_writer_memory_in_bytes" : 1031208,
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : 1751565888002,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 12,
      "data" : 6,
      "coordinating_only" : 0,
      "master" : 3,
      "ingest" : 3
    },
    "versions" : [
      "6.8.23"
    ],
    "os" : {
      "available_processors" : 71,
      "allocated_processors" : 71,
      "names" : [
        {
          "name" : "Linux",
          "count" : 12
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Ubuntu 22.04.4 LTS",
          "count" : 12
        }
      ],
      "mem" : {
        "total_in_bytes" : 176703832064,
        "free_in_bytes" : 13444820992,
        "used_in_bytes" : 163259011072,
        "free_percent" : 8,
        "used_percent" : 92
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 0
      },
      "open_file_descriptors" : {
        "min" : 454,
        "max" : 2163,
        "avg" : 1228
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 31105869371,
      "versions" : [
        {
          "version" : "11.0.27",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "11.0.27+6-post-Ubuntu-0ubuntu122.04",
          "vm_vendor" : "Ubuntu",
          "count" : 7
        },
        {
          "version" : "11.0.23",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "11.0.23+9-post-Ubuntu-1ubuntu122.04.1",
          "vm_vendor" : "Ubuntu",
          "count" : 5
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 57679034800,
        "heap_max_in_bytes" : 134785925120
      },
      "threads" : 809
    },
    "fs" : {
      "total_in_bytes" : 27857859821568,
      "free_in_bytes" : 25364659953664,
      "available_in_bytes" : 23955046666240
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "netty4" : 12
      },
      "http_types" : {
        "netty4" : 12
      }
    }
  }
}

It looks like you are running without any replica shards. If that is the case loss of any data node would lead to data loss. If you are putting the effort into having dedicated master nodes, why not add resiliency through replica shards?

If you look in the official support matrix the OS you are running on is not supported for that Elasticsearch version. This might very well be the problem.

Given the unsupported OS version I would recommend that you upgrade Elasticsearch (so you get on a supported combination) to see if that resolves the issue with nodes falling over.

Is there anything in the logs of the nodes that fall over?

2 Likes

A quiet bizarre thread with a good plot twist, but ...

Why did you think it might be a memory issue? A hunch or something specific in a log somewhere.

Also, you mention "the discovery configuration is related to possible network issues". Actual "network issues" can be very problematic, and cross-site/location are a probably a bit more statistically likely.

Can you re-word that into a more detailed sequence of events. master node 5 leaves the cluster first? Is a new master elected? Does another node become master for a short time, then also crash, before master node 5 is back? Do the master nodes crash, meaning the node itself? Or just exit the cluster. What's in the various master nodes logs?

Incidentally is master node 5 in the site/location, with 3 master nodes, or the one with 2. Just curious.

I am presuming this architecture, with its flaws, has been working fine for a significant time? You didn't do the upgrade to Ubuntu 22.04 like yesterday or something ?

@Christian_Dahlqvist I forgot to check that little detail :pensive_face: ... but it has been working for a while now without problems. Thanks Christian!
Yesterday, analyzing logs I noticed that master nodes 1 and 2 never enter the cluster, except when I disconnect master 5. I only saw timout error looking at master 5 in the logs of master 1 and 2 , so I decided to reboot nodes 1 and 2. After that the nodes joined and the cluster instability disappeared (some problem with the JVM maybe).
I also think that some modifications I made in jvm.options and it seems to work are related to that.

But it's clear that I need to move to version 7 quickly. thanks a lot for your help!!!!!

See you soon

1 Like

Personally, I would not do any upgrade until after I'd addressed the lack of any data resilience. Obviously it's your system so up to you.

Also, again just IMHO, I probably wouldn't update the cluster until I really understood what had happened here. And the explanation given is a bit hand-waving, to my reading. Of course you will know more and there is no compulsion to share.

That's very strange and indicates something wrong somewhere. The 5-master-nodes approach isn't recommended, see above, but if "it has been working for a while now without problems" is true, why did it stop working a few days ago? Clearly I've also no idea what the `modifications I made in jvm.options" were, do you mean the heap increase?

Note the current JVM versions were not 100% aligned, 7x 11.0.27 and 5x 11.0.23 (and 2x completely unknown as those nodes weren't in the cluster when you did the cluster stats call). But the OS was reported as Ubuntu 22.04.4 LTS on the 12 nodes in the cluster at that time.

Last bear in mind this comment:

Anyways, good luck going forward !!