Elasticsearch 6.6.1
I have a 3-node cluster. 2 master+data, 1 data only. Our setup has been running ok for few months until I have to restart them for any reason.
The problem is that I have to restart the machines in seemingly randomly multiple times in order to beat "master not discovered" exception. Today there was even more strange situation when cluster health reported 2 nodes, then 3 nodes, then 2, then 3, etc. My configuration has been the same from the beginning. Today I spent an hour just restarting the service continuously until it just fixed itself and now I see recovering cluster with 3 nodes.
This is my configuration. I'm also open to suggestions about for VM specs for these machines. Each node is 2-core 8GB machine with jvm memory set to 4g with plans to scale them up as needed.
esdata-0
cluster.name: "analytics-cluster"
node.name: "esdata-0"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["esdata-0:9300","esdata-1:9300","esdata-2:9300"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 1
node.attr.update_domain: 1
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: true
thread_pool.index.queue_size: 1000
thread_pool.write.queue_size: 1000
esdata-1
cluster.name: "analytics-cluster"
node.name: "esdata-1"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["esdata-0:9300","esdata-1:9300","esdata-2:9300"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 0
node.attr.update_domain: 0
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: true
thread_pool.index.queue_size: 1000
thread_pool.write.queue_size: 1000
esdata-2
cluster.name: "analytics-cluster"
node.name: "esdata-2"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk1/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["esdata-0:9300","esdata-1:9300","esdata-2:9300"]
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
node.attr.fault_domain: 2
node.attr.update_domain: 2
cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: true
thread_pool.index.queue_size: 1000
thread_pool.write.queue_size: 1000