I have a small baremetal cluster with 9 workers and 3 master nodes.
This is the elasticsearch.yml
for the master nodes:
cluster.name: baremetal_elastic
node.name: "${HOSTNAME}"
path.conf: "/etc/elasticsearch"
path.data: "/home/elasticsearch/elasticsearch_data"
path.logs: "/var/log/elasticsearch"
network.host: _eth0:ipv4_
script.engine.groovy.inline.aggs: 'on'
discovery.zen.ping.unicast.hosts: elastic_009, elastic_010, elastic_011
discovery.zen.minimum_master_nodes: '2'
node.data: 'false'
node.master: 'true'
The workers nodes has the same configuration but node.data
is set to true and node.master
is set to false.
When I look at the logs (In this case, a master node) I see this:
[2017-01-05T14:38:12,757][INFO ][o.e.p.PluginsService ] [elastic_009] no plugins loaded
[2017-01-05T14:38:14,590][INFO ][o.e.n.Node ] [elastic_009] initialized
[2017-01-05T14:38:14,590][INFO ][o.e.n.Node ] [elastic_009] starting ...
[2017-01-05T14:38:14,747][INFO ][o.e.t.TransportService ] [elastic_009] publish_address {153.77.130.74:9300}, bound_addresses {153.77.130.74:9300}
[2017-01-05T14:38:14,750][INFO ][o.e.b.BootstrapCheck ] [elastic_009] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-01-05T14:38:44,803][WARN ][o.e.n.Node ] [elastic_009] timed out while waiting for initial discovery state - timeout: 30s
[2017-01-05T14:38:44,841][INFO ][o.e.h.HttpServer ] [elastic_009] publish_address {153.77.130.74:9200}, bound_addresses {153.77.130.74:9200}
[2017-01-05T14:38:44,841][INFO ][o.e.n.Node ] [elastic_009] started
All these machines are connected to the local network through eth0
and that is the reason I have as network.host
the value _eth0:ipv4_
, that resolves correctly to the machine IP:PORT as it can be seen in the log.
However, none of all the cluster nodes seems to be able to discover anything in the network. when I try to get cluster health I get:
curl -XGET '153.77.130.73:9200/_cluster/health?pretty'
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
curl -XGET '153.77.130.74:9200/_nodes/transport?pretty=1'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "baremetal_elastic",
"nodes" : {
"CqezueeaSCSBaU9hLaetwA" : {
"name" : "elastic_009",
"transport_address" : "153.77.130.74:9300",
"host" : "153.77.130.74",
"ip" : "153.77.130.74",
"version" : "5.0.1",
"build_hash" : "080bb47",
"roles" : [
"master",
"ingest"
],
"transport" : {
"bound_address" : [
"153.77.130.74:9300"
],
"publish_address" : "153.77.130.74:9300",
"profiles" : { }
}
}
}
}
On my other ElasticSearch cluster (Version 2.4) the same output shows all the cluster, here I can only see the node I query.
When I check netstat I see:
# netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 9114 1393/sshd
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 0 22536 2571/master
tcp6 0 0 153.77.130.73:9200 :::* LISTEN 996 25050 3725/java
tcp6 0 0 153.77.130.73:9300 :::* LISTEN 996 28677 3725/java
tcp6 0 0 :::22 :::* LISTEN 0 9116 1393/sshd
tcp6 0 0 ::1:25 :::* LISTEN 0 22537 2571/master
udp 0 0 0.0.0.0:68 0.0.0.0:* 0 15946 1164/dhclient
udp 0 0 153.77.130.73:123 0.0.0.0:* 38 13021 803/ntpd
udp 0 0 127.0.0.1:123 0.0.0.0:* 0 17922 803/ntpd
udp 0 0 0.0.0.0:123 0.0.0.0:* 0 17916 803/ntpd
udp 0 0 0.0.0.0:21080 0.0.0.0:* 0 15937 1164/dhclient
udp6 0 0 fe80::1a03:73ff:fed:123 :::* 38 9075 803/ntpd
udp6 0 0 ::1:123 :::* 0 17923 803/ntpd
udp6 0 0 :::123 :::* 0 17917 803/ntpd
udp6 0 0 :::58293 :::* 0 15938 1164/dhclient
Seems like it is using tcp6 instead of regular tcp, I am not sure if this could interfere with ElasticSearch anyhow.
Is there any configuration I am missing? Should I give up and use the multicast plugin? This option won't be valid for future (productive) clusters.