Hello all.
Setup:
Configurable number of data nodes (3 at the moment), one dedicated master node and configurable number of ingest nodes (based on load) that exist in the same private subnet inside a VPC.
The master node exists in its own AutoScalingGroup with Min: 1, Max: 1, Desired 1.
Kibana exists on a single node in a public subnet and talks to the ES cluster via an internal AWS::ELB.
Configuration:
- For the data node
+ 44 network:
+ 45 publish_host: "_ec2:privateIpv4_"
+ 46 host: "0.0.0.0"
+ 47
48 discovery:
49 zen:
50 ping:
51 multicast:
52 enabled: false
53 type: "ec2"
54 host_type: "private_ip"
55 ec2:
56 tag:
57 node: "es-node"
_ 58
59 cloud:
60 node:
61 auto_attributes: true
62 aws:
63 protocol: "http"
64 region: "ap-southeast-2"
- For master node
41 discovery:
42 zen:
43 ping:
44 multicast:
45 enabled: false
46 type: "ec2"
47 host_type: "private_ip"
48 ec2:
49 tag:
50 node: "es-node"
_ 51
52 cloud:
53 node:
54 auto_attributes: true
55 aws:
56 protocol: "http"
57 region: "ap-southeast-2"
_ 58
59 network:
60 publish_host: "_ec2:privateIpv4_"
61 host: "0.0.0.0"
- For the ingest node
41 discovery:
42 zen:
43 ping:
44 multicast:
45 enabled: false
46 type: "ec2"
47 host_type: "private_ip"
48 ec2:
49 tag:
50 node: "es-node"
_ 51
52 cloud:
53 node:
54 auto_attributes: true
55 aws:
56 protocol: "http"
57 region: "ap-southeast-2"
_ 58
59 network:
60 publish_host: "_ec2:privateIpv4_"
61 host: "0.0.0.0"
Issue
The master node, randomly and every now and then, goes missing. What I mean by missing is, that the ingest nodes at times cannot find the master node and hence cannot add new data at all. And the only thing that seems to be able to get them to find the master node is to restart elasticsearch on master node which rebalances the entire cluster, and causes me to cry into my coffee.
And yet, post restart on the master node, all nodes can find the master easily and all works like clockwork, till it happens again.
And its not the ingest nodes only that cannot find the master node, at times the kibana node, via the ELB, also cannot find the master node at all causing Kibana to give me all sorts of heart stopping messages.
Attempts at resolution:
I have search SO, reddit.com/r/elasticsearch and discuss.elastic.co high and low but I cannot find anything that resembles a solution. Especially since its intermittent.
Any help would be appreciated.
Reward Offered
I will sing The Hills are alive
from Sound of Music loudly in my office!
Thanks