ElasticSearch 5 - Nodes not being discovered

I have a small baremetal cluster with 9 workers and 3 master nodes.

This is the elasticsearch.yml for the master nodes:

cluster.name: baremetal_elastic
node.name: "${HOSTNAME}"
path.conf: "/etc/elasticsearch"
path.data: "/home/elasticsearch/elasticsearch_data"
path.logs: "/var/log/elasticsearch"
network.host: _eth0:ipv4_
script.engine.groovy.inline.aggs: 'on'
discovery.zen.ping.unicast.hosts: elastic_009, elastic_010, elastic_011
discovery.zen.minimum_master_nodes: '2'
node.data: 'false'
node.master: 'true'

The workers nodes has the same configuration but node.data is set to true and node.master is set to false.

When I look at the logs (In this case, a master node) I see this:
[2017-01-05T14:38:12,757][INFO ][o.e.p.PluginsService     ] [elastic_009] no plugins loaded
[2017-01-05T14:38:14,590][INFO ][o.e.n.Node               ] [elastic_009] initialized
[2017-01-05T14:38:14,590][INFO ][o.e.n.Node               ] [elastic_009] starting ...
[2017-01-05T14:38:14,747][INFO ][o.e.t.TransportService   ] [elastic_009] publish_address {153.77.130.74:9300}, bound_addresses {153.77.130.74:9300}
[2017-01-05T14:38:14,750][INFO ][o.e.b.BootstrapCheck     ] [elastic_009] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-01-05T14:38:44,803][WARN ][o.e.n.Node               ] [elastic_009] timed out while waiting for initial discovery state - timeout: 30s
[2017-01-05T14:38:44,841][INFO ][o.e.h.HttpServer         ] [elastic_009] publish_address {153.77.130.74:9200}, bound_addresses {153.77.130.74:9200}
[2017-01-05T14:38:44,841][INFO ][o.e.n.Node               ] [elastic_009] started

All these machines are connected to the local network through eth0 and that is the reason I have as network.host the value _eth0:ipv4_, that resolves correctly to the machine IP:PORT as it can be seen in the log.

However, none of all the cluster nodes seems to be able to discover anything in the network. when I try to get cluster health I get:

curl -XGET '153.77.130.73:9200/_cluster/health?pretty'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}


curl -XGET '153.77.130.74:9200/_nodes/transport?pretty=1'
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "baremetal_elastic",
  "nodes" : {
    "CqezueeaSCSBaU9hLaetwA" : {
      "name" : "elastic_009",
      "transport_address" : "153.77.130.74:9300",
      "host" : "153.77.130.74",
      "ip" : "153.77.130.74",
      "version" : "5.0.1",
      "build_hash" : "080bb47",
      "roles" : [
        "master",
        "ingest"
      ],
      "transport" : {
        "bound_address" : [
          "153.77.130.74:9300"
        ],
        "publish_address" : "153.77.130.74:9300",
        "profiles" : { }
      }
    }
  }
}

On my other ElasticSearch cluster (Version 2.4) the same output shows all the cluster, here I can only see the node I query.

When I check netstat I see:

# netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      0          9114       1393/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      0          22536      2571/master
tcp6       0      0 153.77.130.73:9200      :::*                    LISTEN      996        25050      3725/java
tcp6       0      0 153.77.130.73:9300      :::*                    LISTEN      996        28677      3725/java
tcp6       0      0 :::22                   :::*                    LISTEN      0          9116       1393/sshd
tcp6       0      0 ::1:25                  :::*                    LISTEN      0          22537      2571/master
udp        0      0 0.0.0.0:68              0.0.0.0:*                           0          15946      1164/dhclient
udp        0      0 153.77.130.73:123       0.0.0.0:*                           38         13021      803/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*                           0          17922      803/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*                           0          17916      803/ntpd
udp        0      0 0.0.0.0:21080           0.0.0.0:*                           0          15937      1164/dhclient
udp6       0      0 fe80::1a03:73ff:fed:123 :::*                                38         9075       803/ntpd
udp6       0      0 ::1:123                 :::*                                0          17923      803/ntpd
udp6       0      0 :::123                  :::*                                0          17917      803/ntpd
udp6       0      0 :::58293                :::*                                0          15938      1164/dhclient

Seems like it is using tcp6 instead of regular tcp, I am not sure if this could interfere with ElasticSearch anyhow.

Is there any configuration I am missing? Should I give up and use the multicast plugin? This option won't be valid for future (productive) clusters.

Can you telnet between the nodes?

You should really upgrade to 5.0.2, ideally 5.1.1, there are some bug fixes in those.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.