Connectivity issues with a new/upgraded 2.X cluster? Read here first :)

warkolm · November 27, 2015, 1:44am

If you are having issues with nodes not forming a cluster, or if you are unable to connect to a remote cluster, please read this first!

In Elasticsearch 2.0 we made some major changes to the way default networking is configured.

The primary one is that Elasticsearch no longer listens to all interfaces by default, and instead only binds to loopback/127.00.1. This means that if you are coming from 1.X you will need to explicitly set network.host in your elasticsearch.yml config file.

The second is that multicast discovery has been removed as the default discovery method and is now available via a plugin only.

For more information, please read the release notes, here.

riki · January 23, 2016, 9:08pm

Hi,

I started using ES 2.0. In elasticsearch.yml file I have set network.host as host name.

I see lots of exception as below:

[2016-01-23 09:01:02,388][WARN ][cluster.service ] [Mangle] failed to reconnect to node {Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true}
ConnectTransportException[[Gee][127.0.0.1:9303] connect_timeout[30s]]; nested: ConnectException[Connection refused: no further information: /127.0.0.1:9303];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:922)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:855)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:828)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:598)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: no further information: /127.0.0.1:9303
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more

I am not sure why transport client is still trying to connect to 127.0.0.1 even after setting network.host. I also tried setting transport.host as host name but no luck.

What am I doing wrong?

Thanks.

warkolm · January 24, 2016, 2:04am

It'd help if you provided the relevant config sections.

riki · January 24, 2016, 3:07am

Hi Mark,

Here is the settings that I have put in elasticsearch.yml.

cluster.name: TESTCLUSTER
network.bind_host: es-node.test.com
network.publish_host: es-node.test.com
transport.tcp.port: 9300
http.port: 9200
http.enabled: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: master-node.test.com:9300
index.fielddata.cache: node
indices.fielddata.cache.size: 1m
indices.cache.filter.size: 1m
threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 8000

In log i see the following while starting up

[2016-01-23 08:52:59,603][INFO ][node ] [Mangle] version[2.0.0], pid[1844], build[de54438/2015-10-22T08:09:48Z]
[2016-01-23 08:52:59,605][INFO ][node ] [Mangle] initializing ...
[2016-01-23 08:52:59,942][INFO ][plugins ] [Mangle] loaded , sites
[2016-01-23 08:53:00,337][INFO ][env ] [Mangle] using [1] data paths, mounts [[(C:)]], net usable_space [82.1gb], net total_space [99.6gb], spins? [unknown], types [NTFS]
[2016-01-23 08:53:08,744][INFO ][node ] [Mangle] initialized
[2016-01-23 08:53:08,745][INFO ][node ] [Mangle] starting ...
[2016-01-23 08:53:09,702][INFO ][transport ] [Mangle] publish_address {es-node.test.com/10.135.20.24:9300}, bound_addresses {10.135.20.24:9300}
[2016-01-23 08:53:09,744][INFO ][discovery ] [Mangle] TESTCLUSTER/CYoSCFfaRMSPvb9zvn0KCw
[2016-01-23 08:53:19,141][INFO ][cluster.service ] [Mangle] detected_master {Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300}, added {{Laura Dean}{bme0NfOERmC4qkRz4PEt3A}{127.0.0.1}{127.0.0.1:9301}{data=false, client=true},{Air-Walker}{kGeX3KnXSG2Mqv7shVeKGQ}{10.129.152.151}{10.129.152.151:9300},{Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true},{Fault Zone}{Girj3ldxQgacdj-SHNim6w}{127.0.0.1}{127.0.0.1:9302}{data=false, client=true},{Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300},}, reason: zen-disco-receive(from master [{Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300}])
[2016-01-23 08:53:20,315][WARN ][cluster.service ] [Mangle] failed to connect to node [{Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true}]

Thanks.

warkolm · January 24, 2016, 3:31am

Well it's connecting to a valid interface.

I'd suggest you create a separate thread and ask, I don't think this is a settings issue related to the original post.

Python_Coder · February 21, 2016, 11:22am

are you sure that the host resolution evaluates to the Non-loopback IP address on node Gee?

Topic		Replies	Views
Elasticsearch 2.3.4 connection refused Elasticsearch	17	1153	July 5, 2017
Elasticsearch cluster issue Elasticsearch	3	538	January 16, 2018
Unable to connect Node (client) Elasticsearch	4	638	July 5, 2017
Failed to reconnect to node Elasticsearch	2	2880	July 5, 2017
Nodes discovery after upgrading from 1.7.1 to 2.2.0 Elasticsearch	7	1469	July 5, 2017

Connectivity issues with a new/upgraded 2.X cluster? Read here first :)

Related topics