Node automatic (re)connection when not using multicast

Hello Elasticers,

I have a question about connection retry / reconnection when not using
multicast discovery. When starting the elasticsearch server and then
starting my java node, everything works fine. If I do launch the node
first, and the server a bit after (even before the 30 seconde
discovery timeout), they don't find each others.

The java node is started this way:

Settings settings = ImmutableSettings.settingsBuilder()
.put("http.enabled", "false")
.put("transport.tcp.port", "9301-9400")
.put("discovery.zen.ping.multicast.enabled", "false")
.put("discovery.zen.ping.unicast.hosts", "localhost").build();

node = NodeBuilder.nodeBuilder().client(true).settings(settings).clusterName("testCluster").node();

The server (0.18.7) only have the key
discovery.zen.ping.multicast.enabled: false

Using the http.enabled to false and the transport range starting at
9301 on the client ensure that the server will be available on :9200
and :9300.

When the client starts, it binds the right ports, and search for a
master (not yet started):

[2012-02-28 14:53:31,648][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: initializing ...
[2012-02-28 14:53:31,653][main][org.elasticsearch.plugins] INFO -
[Blindside] loaded [], sites []
[2012-02-28 14:53:32,895][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: initialized
[2012-02-28 14:53:32,895][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: starting ...
[2012-02-28 14:53:32,949][main][org.elasticsearch.transport] INFO -
[Blindside] bound_address {inet[/0.0.0.0:9301]}, publish_address
{inet[/192.168.0.16:9301]}
[2012-02-28 14:54:02,955][main][org.elasticsearch.discovery] WARN -
[Blindside] waited for 30s and no initial state was set by the
discovery
[2012-02-28 14:54:02,957][main][org.elasticsearch.discovery] INFO -
[Blindside] archivezen-dc42/37d45JwRRD-9LXqTHjIVSA
[2012-02-28 14:54:02,958][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: started

But when starting the server, just a few seconds after 14:53:32 (so
before the 30s timeout):

[2012-02-28 14:53:36,258][INFO ][node ] [American
Samurai] {0.18.7}[40287]: initializing ...
[2012-02-28 14:53:36,267][INFO ][plugins ] [American
Samurai] loaded [], sites [head]
[2012-02-28 14:53:38,553][INFO ][node ] [American
Samurai] {0.18.7}[40287]: initialized
[2012-02-28 14:53:38,554][INFO ][node ] [American
Samurai] {0.18.7}[40287]: starting ...
[2012-02-28 14:53:38,632][INFO ][transport ] [American
Samurai] bound_address {inet[/0.0.0.0:9300]}, publish_address
{inet[/192.168.0.16:9300]}
[2012-02-28 14:53:41,720][INFO ][cluster.service ] [American
Samurai] new_master [American
Samurai][9OUbXNBqS5KH2QWyDMMbNQ][inet[/192.168.0.16:9300]], reason:
zen-disco-join (elected_as_master)
[2012-02-28 14:53:41,765][INFO ][discovery ] [American
Samurai] archivezen-dc42/9OUbXNBqS5KH2QWyDMMbNQ
[2012-02-28 14:53:41,874][INFO ][http ] [American
Samurai] bound_address {inet[/0.0.0.0:9200]}, publish_address
{inet[/192.168.0.16:9200]}
[2012-02-28 14:53:41,875][INFO ][node ] [American
Samurai] {0.18.7}[40287]: started
[2012-02-28 14:53:42,309][INFO ][gateway ] [American
Samurai] recovered [1] indices into cluster_state

The server doesn't find any other node.

A few seconds later the client throws an exception:

[2012-02-28 14:55:18,258][elasticsearch[cached]-pool-2-thread-3][org.elasticsearch.discovery.zen.ping.unicast]
WARN - [Blindside] failed to send ping to
[[#zen_unicast_1#][inet[localhost/127.0.0.1:9301]]]
org.elasticsearch.transport.SendRequestTransportException:
[][inet[localhost/127.0.0.1:9301]][discovery/zen/unicast]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:196)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:301)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.access$600(UnicastZenPing.java:77)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:278)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[][inet[localhost/127.0.0.1:9301]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:636)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:448)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:181)
... 6 more

In the end, If I don't start the server BEFORE the client node, they
never find each others.

Is that normal ? Am I doing something the wrong way ?

Thanks
Cheers,
Jérémie

--
Jérémie 'ahFeel' BORDIER

When you specify localhost in the unicast host, it will use localhost:9300, explicitly specify localhost:9300 and localhost:9301, and then the order won't matter. Otherwise, you might end up with the client trying to auto discover with only itself on the list.

On Tuesday, February 28, 2012 at 3:57 PM, Jérémie BORDIER wrote:

Hello Elasticers,

I have a question about connection retry / reconnection when not using
multicast discovery. When starting the elasticsearch server and then
starting my java node, everything works fine. If I do launch the node
first, and the server a bit after (even before the 30 seconde
discovery timeout), they don't find each others.

The java node is started this way:

Settings settings = ImmutableSettings.settingsBuilder()
.put("http.enabled", "false")
.put("transport.tcp.port", "9301-9400")
.put("discovery.zen.ping.multicast.enabled", "false")
.put("discovery.zen.ping.unicast.hosts", "localhost").build();

node = NodeBuilder.nodeBuilder().client(true).settings(settings).clusterName("testCluster").node();

The server (0.18.7) only have the key
discovery.zen.ping.multicast.enabled: false

Using the http.enabled to false and the transport range starting at
9301 on the client ensure that the server will be available on :9200
and :9300.

When the client starts, it binds the right ports, and search for a
master (not yet started):

[2012-02-28 14:53:31,648][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: initializing ...
[2012-02-28 14:53:31,653][main][org.elasticsearch.plugins] INFO -
[Blindside] loaded , sites
[2012-02-28 14:53:32,895][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: initialized
[2012-02-28 14:53:32,895][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: starting ...
[2012-02-28 14:53:32,949][main][org.elasticsearch.transport] INFO -
[Blindside] bound_address {inet[/0.0.0.0:9301]}, publish_address
{inet[/192.168.0.16:9301]}
[2012-02-28 14:54:02,955][main][org.elasticsearch.discovery] WARN -
[Blindside] waited for 30s and no initial state was set by the
discovery
[2012-02-28 14:54:02,957][main][org.elasticsearch.discovery] INFO -
[Blindside] archivezen-dc42/37d45JwRRD-9LXqTHjIVSA
[2012-02-28 14:54:02,958][main][org.elasticsearch.node] INFO -
[Blindside] {0.18.7}[40274]: started

But when starting the server, just a few seconds after 14:53:32 (so
before the 30s timeout):

[2012-02-28 14:53:36,258][INFO ][node ] [American
Samurai] {0.18.7}[40287]: initializing ...
[2012-02-28 14:53:36,267][INFO ][plugins ] [American
Samurai] loaded , sites [head]
[2012-02-28 14:53:38,553][INFO ][node ] [American
Samurai] {0.18.7}[40287]: initialized
[2012-02-28 14:53:38,554][INFO ][node ] [American
Samurai] {0.18.7}[40287]: starting ...
[2012-02-28 14:53:38,632][INFO ][transport ] [American
Samurai] bound_address {inet[/0.0.0.0:9300]}, publish_address
{inet[/192.168.0.16:9300]}
[2012-02-28 14:53:41,720][INFO ][cluster.service ] [American
Samurai] new_master [American
Samurai][9OUbXNBqS5KH2QWyDMMbNQ][inet[/192.168.0.16:9300]], reason:
zen-disco-join (elected_as_master)
[2012-02-28 14:53:41,765][INFO ][discovery ] [American
Samurai] archivezen-dc42/9OUbXNBqS5KH2QWyDMMbNQ
[2012-02-28 14:53:41,874][INFO ][http ] [American
Samurai] bound_address {inet[/0.0.0.0:9200]}, publish_address
{inet[/192.168.0.16:9200]}
[2012-02-28 14:53:41,875][INFO ][node ] [American
Samurai] {0.18.7}[40287]: started
[2012-02-28 14:53:42,309][INFO ][gateway ] [American
Samurai] recovered [1] indices into cluster_state

The server doesn't find any other node.

A few seconds later the client throws an exception:

[2012-02-28 14:55:18,258][elasticsearch[cached]-pool-2-thread-3][org.elasticsearch.discovery.zen.ping.unicast]
WARN - [Blindside] failed to send ping to
[[#zen_unicast_1#][inet[localhost/127.0.0.1:9301]]]
org.elasticsearch.transport.SendRequestTransportException:
[inet[localhost/127.0.0.1:9301]][discovery/zen/unicast]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:196)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:301)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.access$600(UnicastZenPing.java:77)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:278)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[inet[localhost/127.0.0.1:9301]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:636)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:448)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:181)
... 6 more

In the end, If I don't start the server BEFORE the client node, they
never find each others.

Is that normal ? Am I doing something the wrong way ?

Thanks
Cheers,
Jérémie

--
Jérémie 'ahFeel' BORDIER