Cluster discovery worked once, then never again

Just getting started with elasticsearch. Once we set it up on two
machines, they automatically detected each other, formed a cluster, and
began replication (though it didn't complete). Since then, they never
connect.

So far I haven't been able to find any successful fixes or resources to
troubleshoot the issue.

The first computer is Windows XP. The second is Windows 7. We also fired
up a third (Windows 7) that was unable to connect with either.

On the Windows 7 computer, however, we sometimes get the following logs in
the elasticsearch.log when the XP starts elastic search:

org.elasticsearch.transport.ConnectTransportException: [Zemo,
Helmut][inet[/XXX.XXX.XXX.XXX:9300]] connect_timeout[30s]

            at 

org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:711)

            at 

org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:640)

            at 

org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:608)

            at 

org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)

            at 

org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing$Receiver$2.run(MulticastZenPing.java:546)

            at 

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

            at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

            at java.lang.Thread.run(Thread.java:744)

Caused by: java.net.ConnectException: Connection timed out: no further
information: /XXX.XXX.XXX.XXX:9300

            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

            at 

sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)

            at 

org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)

            at 

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

            at 

org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)

            ... 3 more

[2014-01-14 18:40:27,842][WARN ][discovery.zen.ping.multicast] [Grey,
Elaine] failed to connect to requesting node [Zemo,
Helmut][ncfQANnkRmODwvV_E3f-cw][inet[/XXX.XXX.XXX.XXX:9300]]

org.elasticsearch.transport.ConnectTransportException: [Zemo,
Helmut][inet[/XXX.XXX.XXX.XXX:9300]] connect_timeout[30s]

            at 

org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:711)

            at 

org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:640)

            at 

org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:608)

            at 

org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:129)

            at 

org.elasticsearch.discovery.zen.ping.multicast.MulticastZenPing$Receiver$2.run(MulticastZenPing.java:546)

            at 

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

            at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

            at java.lang.Thread.run(Thread.java:744)

Caused by: java.net.ConnectException: Connection timed out: no further
information: /XXX.XXX.XXX.XXX:9300

            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

            at 

sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)

            at 

org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)

            at 

org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)

            at 

org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)

            at 

org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more

Does anyone know the issue or any resources that walk through how to
troubleshoot these issues? Everything I read seems to assume these issues
don't exist because ElasticSearch is so easy to get up and running.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a67796f5-5b6a-4072-bb35-8790e7c7377a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I guess the usual question need to be answered: Are the Java versions
exactly the same on all systems?

And just to be sure, are the Elasticsearch versions exactly the same on all
systems?

Does anyone know the issue or any resources that walk through how to
troubleshoot these issues? Everything I read seems to assume these issues
don't exist because Elasticsearch is so easy to get up and running.

It is, on Solaris, Linux, and Mac OS X. I have never tried running ES on
Windows. But a mix of XP and 7.... Maybe this is a case where you might
wish to try unicast discovery to isolate Windows networking issues across
old and new versions of the OS? Or, try it on the two Windows 7 systems and
leave the XP system out of the picture; can you keep them talking to each
other?

Just some thoughts...

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7fbaad8-9e40-461a-a327-9fd665573c3a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Embarrassing - we had set the program exception in Windows Firewall on the
Windows 7 computer, but we forgotten that Windows Firewall even existed in
XP...which apparently it did, way back then. It was supposed to notify
every time it blocked a program but for whatever reason wasn't notifying
us, but it seems clear that was the issue as it is working now. Apologies,
and many thanks Brian.

On Wednesday, January 15, 2014 3:31:23 PM UTC-6, InquiringMind wrote:

I guess the usual question need to be answered: Are the Java versions
exactly the same on all systems?

And just to be sure, are the Elasticsearch versions exactly the same on
all systems?

Does anyone know the issue or any resources that walk through how to
troubleshoot these issues? Everything I read seems to assume these issues
don't exist because Elasticsearch is so easy to get up and running.

It is, on Solaris, Linux, and Mac OS X. I have never tried running ES on
Windows. But a mix of XP and 7.... Maybe this is a case where you might
wish to try unicast discovery to isolate Windows networking issues across
old and new versions of the OS? Or, try it on the two Windows 7 systems and
leave the XP system out of the picture; can you keep them talking to each
other?

Just some thoughts...

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d16a6042-3fd4-4c9f-a3a1-05b777841f1b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

There may still be similar issues though. If anyone has tips on finding
out when or where there might be firewall or security issues obstructing
communication, please let me know.

Currently we still have problems when the XP node closes Elasticsearch
(making the Windows 7 node master) and then tries to reconnect. The error
logs on the XP show nothing, but the logs on Windows 7 show what is below.
However, if the Windows 7 node is shut down making the XP node the master
again, everything seems to work fine again.

[2014-01-15 16:48:12,778][INFO ][cluster.service ] [NodeHelios]
added
{[NodeDevelopment01][UZmP9gC_Q9es46k6o9OWTw][inet[/XXX.XXX.XXX.XXX:9300]],},
reason: zen-disco-receive(join from
node[[NodeDevelopment01][UZmP9gC_Q9es46k6o9OWTw][inet[/XXX.XXX.XXX.XXX:9300]]])
[2014-01-15 16:48:12,873][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [4] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,876][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [10] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,878][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [6] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,879][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [12] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,887][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [3] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,890][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [2] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,890][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [9] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,891][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [8] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,891][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [5] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,892][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [11] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,893][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [1] and action
[cluster/nodeIndexCreated], resetting
[2014-01-15 16:48:12,894][WARN ][transport.netty ] [NodeHelios]
Message not fully read (request) for [7] and action
[cluster/nodeIndexCreated], resetting

On Wednesday, January 15, 2014 3:51:44 PM UTC-6, kla...@sfile.com wrote:

Embarrassing - we had set the program exception in Windows Firewall on the
Windows 7 computer, but we forgotten that Windows Firewall even existed in
XP...which apparently it did, way back then. It was supposed to notify
every time it blocked a program but for whatever reason wasn't notifying
us, but it seems clear that was the issue as it is working now. Apologies,
and many thanks Brian.

On Wednesday, January 15, 2014 3:31:23 PM UTC-6, InquiringMind wrote:

I guess the usual question need to be answered: Are the Java versions
exactly the same on all systems?

And just to be sure, are the Elasticsearch versions exactly the same on
all systems?

Does anyone know the issue or any resources that walk through how to
troubleshoot these issues? Everything I read seems to assume these issues
don't exist because Elasticsearch is so easy to get up and running.

It is, on Solaris, Linux, and Mac OS X. I have never tried running ES on
Windows. But a mix of XP and 7.... Maybe this is a case where you might
wish to try unicast discovery to isolate Windows networking issues across
old and new versions of the OS? Or, try it on the two Windows 7 systems and
leave the XP system out of the picture; can you keep them talking to each
other?

Just some thoughts...

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c609015a-6cbb-4e1c-a41c-13a76848590d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.