I'm trying to spin up a new 5 node cluster, and I'm not getting anywhere.
All 5 nodes are running a version of this same config file, with the variations being in IP addresses of the other nodes and the hostname, as well as 2 nodes being only 'data' nodes.
cluster.name: testcluster
node.name: node1
path.data: /opt/elasticsearch/data
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: [_local_, _em1_]
network.bind_host: [_local_, _em1_]
network.publish_host: [_local_, _em1_]
discovery.zen.ping.unicast.hosts: ["10.0.1.2", "10.0.1.3", "10.0.1.4", "10.0.1.5"]
discovery.zen.minimum_master_nodes: 3
gateway.recovery_after_nodes: 3
node.master: true
node.data: true
The log errors I am seeing are all very similar to this (logging set to debug):
[2016-10-21 10:14:31,563][DEBUG][transport.netty ] [node1] using profile[default], worker_count[16], port[9300-9400], bind_host[null], publish_host[null], compress[false], connect_timeout[30s], connections_per_node[2/3/6/1/1], receive_predictor[512kb->512kb]
[2016-10-21 10:14:31,615][DEBUG][transport.netty ] [node1] binding server bootstrap to: ::1
[2016-10-21 10:14:31,638][DEBUG][transport.netty ] [node1] Bound profile [default] to address {[::1]:9300}
[2016-10-21 10:14:31,640][DEBUG][transport.netty ] [node1] Bound profile [default] to address {127.0.0.1:9300}
[2016-10-21 10:14:31,641][DEBUG][transport.netty ] [node1] Bound profile [default] to address {10.0.1.1:9300}
[2016-10-21 10:14:31,642][DEBUG][transport.netty ] [node1] Bound profile [default] to address {[fe80::250:56ff:fe01:438]:9300}
[2016-10-21 10:14:31,644][INFO ][transport ] [node1] publish_address {10.0.1.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}, {10.0.1.1:9300}, {[ff88::001:88ff:ff88:001]:9300}
[2016-10-21 10:14:31,649][INFO ][discovery ] [node1] testcluster/1DdD32DdDDDD-3dD4Dd52d
[2016-10-21 10:14:31,652][DEBUG][cluster.service ] [node1] processing [initial_join]: execute
[2016-10-21 10:14:31,656][DEBUG][cluster.service ] [node1] processing [initial_join]: took 3ms no change in cluster_state
[2016-10-21 10:14:31,715][WARN ][transport.netty ] [node1] exception caught on transport layer [[id: 0xc564d8c7]], closing connection
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
I don't quite understand why I am seeing this. Can anyone give me a clue? Occasionally I will see the other nodes try to join. So I'll see something like this:
[2016-10-21 10:14:37,680][DEBUG][discovery.zen ] [testcluster] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [{node3}{WfGuHey-RG65D9UH5ZMZNw}{10.0.1.2}{10.0.1.2:9300}], id[489], master [null], hasJoinedOnce [false], cluster_name[graylog]}
[2016-10-21 10:14:37,681][WARN ][transport.netty ] [testcluster] exception caught on transport layer [[id: 0x3cab43aa]], closing connection
java.net.NoRouteToHostException: No route to host
I have SELINUX set to Permissive. Any clue?