Elasticsearch not handle autodiscover after master node down

Node1

cluster.name: trinix-cluster
node.name: es-node-01
node.data: true
node.master: true

cluster.initial_master_nodes:

  • es-node-01
  • es-node-02

indices.memory.index_buffer_size: 1gb
indices.fielddata.cache.size: 1g
thread_pool:
search:
size: 1000
queue_size: 4000
transport.host: 192.168.2.6
transport.tcp.port: 9300
discovery.find_peers_interval: 1s
search.max_buckets: 9000000

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: ["es-node-01", "es-node-02"]
network.publish_host: ["es-node-01", "es-node-02"]

network.host: 0.0.0.0
http.port: 9200

Node2

cluster.name: trinix-cluster
node.name: es-node-01
node.data: true
node.master: true

cluster.initial_master_nodes:

  • es-node-01
  • es-node-02

indices.memory.index_buffer_size: 1gb
indices.fielddata.cache.size: 1g
thread_pool:
search:
size: 1000
queue_size: 4000
transport.host: 192.168.2.4
transport.tcp.port: 9300
discovery.find_peers_interval: 1s
search.max_buckets: 9000000

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: ["es-node-01", "es-node-02"]
network.publish_host: ["es-node-01", "es-node-02"]

network.host: 0.0.0.0
http.port: 9200

the problem is, if elasticsearch es-node-01 stopped, es-node-02 not handle as master

[2019-07-22T06:01:02,004][INFO ][o.e.c.s.ClusterApplierService] [es-node-02] master node changed {previous [{es-node-01}{9Jb373AuT-iU_pS9Wf1_Iw}{XnG2oGApQ_CmviKx7W0s4g}{192.168.2.6}{192.168.2.6:9300}{ml.machine_memory=8196714496, ml.max_open_jobs=20, xpack.installed=true}], current }, term: 93, version: 3582, reason: becoming candidate: onLeaderFailure
[2019-07-22T06:01:02,121][WARN ][o.e.a.b.TransportShardBulkAction] [es-node-02] [[metricbeat-ps-app2-2019.07][0]] failed to perform indices:data/write/bulk[s] on replica [metricbeat-pojoksatu-app2-2019.07][0], node[9Jb373AuT-iU_pS9Wf1_Iw], [R], s[STARTED], a[id=0rBO-3qfQ6WcEry_xvLgIg]
org.elasticsearch.transport.NodeDisconnectedException: [es-node-01][192.168.2.6:9300][indices:data/write/bulk[s][r]] disconnected
[2019-07-22T06:01:02,141][WARN ][o.e.c.a.s.ShardStateAction] [es-node-02] no master known for action [internal:cluster/shard/failure] for shard entry [shard id [[metricbeat-pojoksatu-app2-2019.07][0]], allocation id [0rBO-3qfQ6WcEry_xvLgIg], primary term [25], message [failed to perform indices:data/write/bulk[s] on replica [metricbeat-pojoksatu-app2-2019.07][0], node[9Jb373AuT-iU_pS9Wf1_Iw], [R], s[STARTED], a[id=0rBO-3qfQ6WcEry_xvLgIg]], failure [NodeDisconnectedException[[es-node-01][192.168.2.6:9300][indices:data/write/bulk[s][r]] disconnected]], markAsStale [true]]
[2019-07-22T06:01:02,121][WARN ][o.e.c.NodeConnectionsService] [es-node-02] failed to connect to {es-node-01}{9Jb373AuT-iU_pS9Wf1_Iw}{XnG2oGApQ_CmviKx7W0s4g}{192.168.2.6}{192.168.2.6:9300}{ml.machine_memory=8196714496, ml.max_open_jobs=20, xpack.installed=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [es-node-01][192.168.2.6:9300] connect_exception
at

That is expected. Elasticsearch master election is based on consensus and a majority of master eligible nodes is required. You therefore need a minimum of 3 master eligible nodes in order to have a highly available cluster that can handle one master eligible node being down.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.