Connectivity issues with a new/upgraded 2.X cluster? Read here first :)


(Mark Walkom) #1

If you are having issues with nodes not forming a cluster, or if you are unable to connect to a remote cluster, please read this first!

In Elasticsearch 2.0 we made some major changes to the way default networking is configured.

The primary one is that Elasticsearch no longer listens to all interfaces by default, and instead only binds to loopback/127.00.1. This means that if you are coming from 1.X you will need to explicitly set network.host in your elasticsearch.yml config file.

The second is that multicast discovery has been removed as the default discovery method and is now available via a plugin only.

For more information, please read the release notes, here.


Remote access elasticsearch 9200 fails
Querying Elasticsearch Remotely
Elasticsearch unavailable during upgrade
How do you install Marvel in windows?
(Mark Walkom) #2

(Mark Walkom) #3

(riki) #4

Hi,

I started using ES 2.0. In elasticsearch.yml file I have set network.host as host name.

I see lots of exception as below:

[2016-01-23 09:01:02,388][WARN ][cluster.service ] [Mangle] failed to reconnect to node {Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true}
ConnectTransportException[[Gee][127.0.0.1:9303] connect_timeout[30s]]; nested: ConnectException[Connection refused: no further information: /127.0.0.1:9303];
at org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:922)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:855)
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:828)
at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:243)
at org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:598)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: no further information: /127.0.0.1:9303
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more

I am not sure why transport client is still trying to connect to 127.0.0.1 even after setting network.host. I also tried setting transport.host as host name but no luck.

What am I doing wrong?

Thanks.


(Mark Walkom) #5

It'd help if you provided the relevant config sections.


(riki) #6

Hi Mark,

Here is the settings that I have put in elasticsearch.yml.

cluster.name: TESTCLUSTER
network.bind_host: es-node.test.com
network.publish_host: es-node.test.com
transport.tcp.port: 9300
http.port: 9200
http.enabled: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: master-node.test.com:9300
index.fielddata.cache: node
indices.fielddata.cache.size: 1m
indices.cache.filter.size: 1m
threadpool.bulk.type: fixed
threadpool.bulk.queue_size: 8000

In log i see the following while starting up

[2016-01-23 08:52:59,603][INFO ][node ] [Mangle] version[2.0.0], pid[1844], build[de54438/2015-10-22T08:09:48Z]
[2016-01-23 08:52:59,605][INFO ][node ] [Mangle] initializing ...
[2016-01-23 08:52:59,942][INFO ][plugins ] [Mangle] loaded [], sites []
[2016-01-23 08:53:00,337][INFO ][env ] [Mangle] using [1] data paths, mounts [[(C:)]], net usable_space [82.1gb], net total_space [99.6gb], spins? [unknown], types [NTFS]
[2016-01-23 08:53:08,744][INFO ][node ] [Mangle] initialized
[2016-01-23 08:53:08,745][INFO ][node ] [Mangle] starting ...
[2016-01-23 08:53:09,702][INFO ][transport ] [Mangle] publish_address {es-node.test.com/10.135.20.24:9300}, bound_addresses {10.135.20.24:9300}
[2016-01-23 08:53:09,744][INFO ][discovery ] [Mangle] TESTCLUSTER/CYoSCFfaRMSPvb9zvn0KCw
[2016-01-23 08:53:19,141][INFO ][cluster.service ] [Mangle] detected_master {Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300}, added {{Laura Dean}{bme0NfOERmC4qkRz4PEt3A}{127.0.0.1}{127.0.0.1:9301}{data=false, client=true},{Air-Walker}{kGeX3KnXSG2Mqv7shVeKGQ}{10.129.152.151}{10.129.152.151:9300},{Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true},{Fault Zone}{Girj3ldxQgacdj-SHNim6w}{127.0.0.1}{127.0.0.1:9302}{data=false, client=true},{Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300},}, reason: zen-disco-receive(from master [{Ankhi}{SEMYv5YIQQ27KQMywDxcjQ}{10.135.20.45}{10.135.20.45:9300}])
[2016-01-23 08:53:20,315][WARN ][cluster.service ] [Mangle] failed to connect to node [{Gee}{MtDkhr9rSza5QoSqKTqkrA}{127.0.0.1}{127.0.0.1:9303}{data=false, client=true}]

Thanks.


(Mark Walkom) #7

Well it's connecting to a valid interface.

I'd suggest you create a separate thread and ask, I don't think this is a settings issue related to the original post.


2 nodes with the same cluster but in configured in different machines
Nodes discovery after upgrading from 1.7.1 to 2.2.0
Nearly All Indices UNASSIGNED and All Are Red
(Python Coder) #8

are you sure that the host resolution evaluates to the Non-loopback IP address on node Gee?


Unable to RUN bulk API client from different server -- NoNodeAvailableException
(system) #9