Elasticsearch master node failover

Hello. I have Elasticsearch cluster (for graylog) with Elasticsearch-oss 7.10.2
Cluster has 3 nodes: 2 data/master and 1 master

# curl -XGET 'localhost:9200/_cat/nodes?v&pretty'
ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.207.20.22           53          72   0    0.02    0.01     0.00 imr       *      monlog02p
10.207.20.24           34         100   7    0.70    0.49     0.53 dimr      -      monlog04p
10.207.20.25           30         100   2    0.32    0.38     0.43 dimr      -      monlog05p

All nodes can be master. But when i stopping current master node (monlog02p), cluster don`t elect new master

{"type": "server", "timestamp": "2022-06-17T10:56:25,382Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "graylog", "node.name": "monlog05p", "message": "master not discovered or elected yet, an election requires a node with id [bol3_--gRTeY40YE75jMEg], have discovered [{monlog05p}{Wks5aVZIQdKhh-UQfN6Kjw}{p_43YmiFTUWLsz2qi9zLcg}{10.207.20.25}{10.207.20.25:9300}{dimr}, {monlog04p}{RWc0y6fuRZOOy92Ufr5PJQ}{gwTE_agiSEGaLsUMZKm7XA}{10.207.20.24}{10.207.20.24:9300}{dimr}] which is not a quorum; discovery will continue using [10.207.20.24:9300, 10.207.20.22:9300] from hosts providers and [{monlog05p}{Wks5aVZIQdKhh-UQfN6Kjw}{p_43YmiFTUWLsz2qi9zLcg}{10.207.20.25}{10.207.20.25:9300}{dimr}, {monlog04p}{RWc0y6fuRZOOy92Ufr5PJQ}{gwTE_agiSEGaLsUMZKm7XA}{10.207.20.24}{10.207.20.24:9300}{dimr}, {monlog02p}{bol3_--gRTeY40YE75jMEg}{vRDVG-R9RVyx5XauH_4v9Q}{10.207.20.22}{10.207.20.22:9300}{imr}] from last-known cluster state; node term 52, last-accepted version 2847 in term 52", "cluster.uuid": "Ctv7jUkOS9q0f4FJhoy3Ow", "node.id": "Wks5aVZIQdKhh-UQfN6Kjw"  }
# curl -X GET "localhost:9200/_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions&pretty"
{
  "metadata" : {
    "cluster_coordination" : {
      "voting_config_exclusions" : [
        {
          "node_id" : "Wks5aVZIQdKhh-UQfN6Kjw",
          "node_name" : "monlog05p"
        },
        {
          "node_id" : "bol3_--gRTeY40YE75jMEg",
          "node_name" : "monlog02p"
        },
        {
          "node_id" : "_absent_",
          "node_name" : "node_name"
        },
        {
          "node_id" : "RWc0y6fuRZOOy92Ufr5PJQ",
          "node_name" : "monlog04p"
        }
      ]
    }
  }
}

what I should do?

You should clear your voting config exclusions. See these docs, particularly

Clusters should have no voting configuration exclusions in normal operation.

Also, 7.10 has passed EOL and is no longer supported. You should upgrade to a supported version as a matter of urgency.

# curl -XDELETE 'localhost:9200/_cluster/voting_config_exclusions'
{"error":{"root_cause":[{"type":"timeout_exception","reason":"timed out waiting for removal of nodes; if nodes should not be removed, set waitForRemoval to false. [{monlog05p}{Wks5aVZIQdKhh-UQfN6Kjw}, {monlog02p}{bol3_--gRTeY40YE75jMEg}, {node_name}{_absent_}, {monlog04p}{RWc0y6fuRZOOy92Ufr5PJQ}]"}],"type":"timeout_exception","reason":"timed out waiting for removal of nodes; if nodes should not be removed, set waitForRemoval to false. [{monlog05p}{Wks5aVZIQdKhh-UQfN6Kjw}, {monlog02p}{bol3_--gRTeY

its doesnt work..
graylog supportns only this version(

I see, there's a small bug in the error message that #87828 fixes. It should read:

if nodes should not be removed, set ?wait_for_removal=false

Set that parameter and it should work.

curl -X DELETE "localhost:9200/_cluster/voting_config_exclusions?wait_for_removal=false"

thats work! great, thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.