Problem restarting cluster on 6.6.1

punppis · August 7, 2019, 9:51am

Elasticsearch 6.6.1

I have a 3-node cluster. 2 master+data, 1 data only. Our setup has been running ok for few months until I have to restart them for any reason.

The problem is that I have to restart the machines in seemingly randomly multiple times in order to beat "master not discovered" exception. Today there was even more strange situation when cluster health reported 2 nodes, then 3 nodes, then 2, then 3, etc. My configuration has been the same from the beginning. Today I spent an hour just restarting the service continuously until it just fixed itself and now I see recovering cluster with 3 nodes.

This is my configuration. I'm also open to suggestions about for VM specs for these machines. Each node is 2-core 8GB machine with jvm memory set to 4g with plans to scale them up as needed.

esdata-0
cluster.name: "analytics-cluster"
node.name: "esdata-0"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["esdata-0:9300","esdata-1:9300","esdata-2:9300"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 1
node.attr.update_domain: 1
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: true
thread_pool.index.queue_size: 1000
thread_pool.write.queue_size: 1000

esdata-1
cluster.name: "analytics-cluster"
node.name: "esdata-1"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["esdata-0:9300","esdata-1:9300","esdata-2:9300"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 0
node.attr.update_domain: 0
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: true
thread_pool.index.queue_size: 1000
thread_pool.write.queue_size: 1000

esdata-2
cluster.name: "analytics-cluster"
node.name: "esdata-2"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["esdata-0:9300","esdata-1:9300","esdata-2:9300"]
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 2
node.attr.update_domain: 2
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: true
thread_pool.index.queue_size: 1000
thread_pool.write.queue_size: 1000

DavidTurner · August 7, 2019, 11:22am

Can you share the logs from your failed attempts to get these nodes to form a cluster?

Also why do you have one of your three nodes set node.master: false? You need at least three master-eligible nodes to run a fault-tolerant cluster.

punppis · August 31, 2019, 10:51am

For some reason I had one of the nodes as non-master-eligible. I changed it immediately after reading your comment and today one of our nodes crashed (suspecting out of memory but thats not the issue here). And now I have the same problem with 3 master-nodes: you can see the status change between 2/2 and 3/3 nodes all the time (few seconds apart)

Here is a log file: https://pastebin.com/nuHhEPJi (there was a LOT of logs caused by services trying to use the ES API and I tried to strip them out. hopefully I did not exclude anything important)

After a bunch of restart attempts the cluster started OK and is now working again. I would like to avoid this in the future. Shouldn't the cluster get back online after restarting just the failed node?

system · September 28, 2019, 10:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
After restarting the master node, data and client nodes cannot discover the master Elasticsearch	11	946	July 12, 2023
Master Not Discovered (V: 6.1.0) Elasticsearch	3	843	January 23, 2018
Elasticsearch throws 'not enough master nodes discovered during pinging' error Elasticsearch	4	2697	October 4, 2018
Master nodes do not detect the other masters after service restart Elasticsearch	10	5078	August 21, 2019
Elasticsearch: Not enough master nodes discovered during pinging Elasticsearch	3	21896	March 6, 2018

Problem restarting cluster on 6.6.1

Related topics