ElasticSearch 5 - Nodes not being discovered


(Jose Luis Navarro Vicente) #1

I have a small baremetal cluster with 9 workers and 3 master nodes.

This is the elasticsearch.yml for the master nodes:

cluster.name: baremetal_elastic
node.name: "${HOSTNAME}"
path.conf: "/etc/elasticsearch"
path.data: "/home/elasticsearch/elasticsearch_data"
path.logs: "/var/log/elasticsearch"
network.host: _eth0:ipv4_
script.engine.groovy.inline.aggs: 'on'
discovery.zen.ping.unicast.hosts: elastic_009, elastic_010, elastic_011
discovery.zen.minimum_master_nodes: '2'
node.data: 'false'
node.master: 'true'

The workers nodes has the same configuration but node.data is set to true and node.master is set to false.

When I look at the logs (In this case, a master node) I see this:
[2017-01-05T14:38:12,757][INFO ][o.e.p.PluginsService     ] [elastic_009] no plugins loaded
[2017-01-05T14:38:14,590][INFO ][o.e.n.Node               ] [elastic_009] initialized
[2017-01-05T14:38:14,590][INFO ][o.e.n.Node               ] [elastic_009] starting ...
[2017-01-05T14:38:14,747][INFO ][o.e.t.TransportService   ] [elastic_009] publish_address {153.77.130.74:9300}, bound_addresses {153.77.130.74:9300}
[2017-01-05T14:38:14,750][INFO ][o.e.b.BootstrapCheck     ] [elastic_009] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-01-05T14:38:44,803][WARN ][o.e.n.Node               ] [elastic_009] timed out while waiting for initial discovery state - timeout: 30s
[2017-01-05T14:38:44,841][INFO ][o.e.h.HttpServer         ] [elastic_009] publish_address {153.77.130.74:9200}, bound_addresses {153.77.130.74:9200}
[2017-01-05T14:38:44,841][INFO ][o.e.n.Node               ] [elastic_009] started

All these machines are connected to the local network through eth0 and that is the reason I have as network.host the value _eth0:ipv4_, that resolves correctly to the machine IP:PORT as it can be seen in the log.

However, none of all the cluster nodes seems to be able to discover anything in the network. when I try to get cluster health I get:

curl -XGET '153.77.130.73:9200/_cluster/health?pretty'
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}


curl -XGET '153.77.130.74:9200/_nodes/transport?pretty=1'
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "baremetal_elastic",
  "nodes" : {
    "CqezueeaSCSBaU9hLaetwA" : {
      "name" : "elastic_009",
      "transport_address" : "153.77.130.74:9300",
      "host" : "153.77.130.74",
      "ip" : "153.77.130.74",
      "version" : "5.0.1",
      "build_hash" : "080bb47",
      "roles" : [
        "master",
        "ingest"
      ],
      "transport" : {
        "bound_address" : [
          "153.77.130.74:9300"
        ],
        "publish_address" : "153.77.130.74:9300",
        "profiles" : { }
      }
    }
  }
}

On my other ElasticSearch cluster (Version 2.4) the same output shows all the cluster, here I can only see the node I query.

When I check netstat I see:

# netstat -tuplen
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      0          9114       1393/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      0          22536      2571/master
tcp6       0      0 153.77.130.73:9200      :::*                    LISTEN      996        25050      3725/java
tcp6       0      0 153.77.130.73:9300      :::*                    LISTEN      996        28677      3725/java
tcp6       0      0 :::22                   :::*                    LISTEN      0          9116       1393/sshd
tcp6       0      0 ::1:25                  :::*                    LISTEN      0          22537      2571/master
udp        0      0 0.0.0.0:68              0.0.0.0:*                           0          15946      1164/dhclient
udp        0      0 153.77.130.73:123       0.0.0.0:*                           38         13021      803/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*                           0          17922      803/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*                           0          17916      803/ntpd
udp        0      0 0.0.0.0:21080           0.0.0.0:*                           0          15937      1164/dhclient
udp6       0      0 fe80::1a03:73ff:fed:123 :::*                                38         9075       803/ntpd
udp6       0      0 ::1:123                 :::*                                0          17923      803/ntpd
udp6       0      0 :::123                  :::*                                0          17917      803/ntpd
udp6       0      0 :::58293                :::*                                0          15938      1164/dhclient

Seems like it is using tcp6 instead of regular tcp, I am not sure if this could interfere with ElasticSearch anyhow.

Is there any configuration I am missing? Should I give up and use the multicast plugin? This option won't be valid for future (productive) clusters.


(Mark Walkom) #2

Can you telnet between the nodes?

You should really upgrade to 5.0.2, ideally 5.1.1, there are some bug fixes in those.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.