Connectivity issues with a new/upgraded 2.X cluster? Read here first :)

If you are having issues with nodes not forming a cluster, or if you are unable to connect to a remote cluster, please read this first!

In Elasticsearch 2.0 we made some major changes to the way default networking is configured.

The primary one is that Elasticsearch no longer listens to all interfaces by default, and instead only binds to loopback/127.00.1. This means that if you are coming from 1.X you will need to explicitly set network.host in your elasticsearch.yml config file.

The second is that multicast discovery has been removed as the default discovery method and is now available via a plugin only.

For more information, please read the release notes, here.

1 Like

Hi,

I started using ES 2.0. In elasticsearch.yml file I have set network.host as host name.

I see lots of exception as below:

[2016-01-23 09:01:02,388][WARN ][cluster.service ] [Mangle] failed to reconnect to node {Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true}
ConnectTransportException[[Gee][127.0.0.1:9303] connect_timeout[30s]]; nested: ConnectException[Connection refused: no further information: /127.0.0.1:9303];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:922)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:855)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:828)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:598)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: no further information: /127.0.0.1:9303
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more

I am not sure why transport client is still trying to connect to 127.0.0.1 even after setting network.host. I also tried setting transport.host as host name but no luck.

What am I doing wrong?

Thanks.

It'd help if you provided the relevant config sections.

Hi Mark,

Here is the settings that I have put in elasticsearch.yml.

cluster.name: TESTCLUSTER
network.bind_host: es-node.test.com
network.publish_host: es-node.test.com
transport.tcp.port: 9300
http.port: 9200
http.enabled: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: master-node.test.com:9300
index.fielddata.cache: node
indices.fielddata.cache.size: 1m
indices.cache.filter.size: 1m
threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 8000

In log i see the following while starting up

[2016-01-23 08:52:59,603][INFO ][node ] [Mangle] version[2.0.0], pid[1844], build[de54438/2015-10-22T08:09:48Z]
[2016-01-23 08:52:59,605][INFO ][node ] [Mangle] initializing ...
[2016-01-23 08:52:59,942][INFO ][plugins ] [Mangle] loaded [], sites []
[2016-01-23 08:53:00,337][INFO ][env ] [Mangle] using [1] data paths, mounts [[(C:)]], net usable_space [82.1gb], net total_space [99.6gb], spins? [unknown], types [NTFS]
[2016-01-23 08:53:08,744][INFO ][node ] [Mangle] initialized
[2016-01-23 08:53:08,745][INFO ][node ] [Mangle] starting ...
[2016-01-23 08:53:09,702][INFO ][transport ] [Mangle] publish_address {es-node.test.com/10.135.20.24:9300}, bound_addresses {10.135.20.24:9300}
[2016-01-23 08:53:09,744][INFO ][discovery ] [Mangle] TESTCLUSTER/CYoSCFfaRMSPvb9zvn0KCw
[2016-01-23 08:53:19,141][INFO ][cluster.service ] [Mangle] detected_master {Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300}, added {{Laura Dean}{bme0NfOERmC4qkRz4PEt3A}{127.0.0.1}{127.0.0.1:9301}{data=false, client=true},{Air-Walker}{kGeX3KnXSG2Mqv7shVeKGQ}{10.129.152.151}{10.129.152.151:9300},{Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true},{Fault Zone}{Girj3ldxQgacdj-SHNim6w}{127.0.0.1}{127.0.0.1:9302}{data=false, client=true},{Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300},}, reason: zen-disco-receive(from master [{Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300}])
[2016-01-23 08:53:20,315][WARN ][cluster.service ] [Mangle] failed to connect to node [{Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true}]

Thanks.

Well it's connecting to a valid interface.

I'd suggest you create a separate thread and ask, I don't think this is a settings issue related to the original post.

are you sure that the host resolution evaluates to the Non-loopback IP address on node Gee?