Hello,
I am using Elasticsearch v7.17.0. Here is my configuration that I am testing the new version with,
I have dedicated master node (3) and data node (1). This sets up my cluster and am in business. However, when I scale down the master node (and in turn terminate the active-master node), I lose the cluster state information.
Error:
curl http://escluster.abc.internal:9200/_cat/nodes?v
{"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":null}],"type":"master_not_discovered_exception","reason":null},"status":503}
When I setup initial cluster I only had one-master and one-date and used this property to setup the initial master configuration and then later on added two master nodes which in turn updated the property cluster.initial_master_nodes
to add two more IPs.
/etc/elasticsearch/elasticsearch.yml:
cluster.name: escluster
network.host: _local:ipv4_, _site_
path.data: /data/lib
path.logs: /data/log
cluster.initial_master_nodes: [ip-10-192-106-161,ip-10-192-105-166,ip-10-192-108-236]
discovery.seed_providers: ec2
discovery.ec2.groups: sg-xxxxx
discovery.ec2.endpoint: ec2.us-east-1.amazonaws.com
node.roles: master
If I do things more gracefully, that is remove the active node from the voting configuration and then terminate the cluster maintains status. However, in real world scenario systems can come and go so want to understand how can I achieve resiliency between my cluster of master nodes.
Also note that I thought cluster.initial_master_nodes
is only needed during intial bootstrap and then no longer used so I am confused why this test is not resilient.
Adding some log entries as well, it seems once the active master node is lost this the error message from other master node.
[2022-02-14T10:34:39,474][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ip-10-192-105-166] master not discovered or elected yet, an election requires a node with id [n8lziZmbTgqKYEvMlc5krg], have only discovered non-quorum [{ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}]; discovery will continue using [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.192.105.38:9300, 10.192.105.166:9300] from hosts providers and [{ip-10-192-106-161}{n8lziZmbTgqKYEvMlc5krg}{L0GCQ91fTxGASbIASxTenQ}{10.192.106.161}{10.192.106.161:9300}{m}, {ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}] from last-known cluster state; node term 4, last-accepted version 86 in term 4
[2022-02-14T10:34:49,475][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ip-10-192-105-166] master not discovered or elected yet, an election requires a node with id [n8lziZmbTgqKYEvMlc5krg], have only discovered non-quorum [{ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}]; discovery will continue using [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.192.105.38:9300, 10.192.105.166:9300] from hosts providers and [{ip-10-192-106-161}{n8lziZmbTgqKYEvMlc5krg}{L0GCQ91fTxGASbIASxTenQ}{10.192.106.161}{10.192.106.161:9300}{m}, {ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}] from last-known cluster state; node term 4, last-accepted version 86 in term 4
[2022-02-14T10:34:59,477][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ip-10-192-105-166] master not discovered or elected yet, an election requires a node with id [n8lziZmbTgqKYEvMlc5krg], have only discovered non-quorum [{ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}]; discovery will continue using [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.192.105.38:9300, 10.192.105.166:9300] from hosts providers and [{ip-10-192-106-161}{n8lziZmbTgqKYEvMlc5krg}{L0GCQ91fTxGASbIASxTenQ}{10.192.106.161}{10.192.106.161:9300}{m}, {ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}] from last-known cluster state; node term 4, last-accepted version 86 in term 4
[2022-02-14T10:35:09,479][WARN ][o.e.c.c.ClusterFormationFailureHelper] [ip-10-192-105-166] master not discovered or elected yet, an election requires a node with id [n8lziZmbTgqKYEvMlc5krg], have only discovered non-quorum [{ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}]; discovery will continue using [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, [::1]:9300, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, [::1]:9305, 10.192.105.38:9300, 10.192.105.166:9300] from hosts providers and [{ip-10-192-106-161}{n8lziZmbTgqKYEvMlc5krg}{L0GCQ91fTxGASbIASxTenQ}{10.192.106.161}{10.192.106.161:9300}{m}, {ip-10-192-105-166}{FIhJ-P81SvaXTFkjq-PRbg}{1tVS0hmtQc6-phTARZjtFg}{10.192.105.166}{10.192.105.166:9300}{m}] from last-known cluster state; node term 4, last-accepted version 86 in term 4
Any help is appreciated.
-Cross