Different behavior in unicast, between ver 17.1 and 17.0?


(fucema) #1

Hi Shay,

I have upgraded from 16.x up to 17.0 in the past and had no problems
connecting a client only node to a master node using unicast settings.
Has there been a change in ver 17.1 behavior regarding networking
between multicast clusters and unicast clients?

After upgrading to 17.1, the unicast client only node is no longer
"seeing" the master node, and not connecting to the cluster. Cluster
names have not changed. No other changes were made in configuration,
except for updating the elastic jar to 17.1. I have also deleted the
data folders to start the cluster from fresh. The master node has
default network settings (ie uses multicast).

Here is the client only java startup snippet:

	Settings settings = settingsBuilder()
			.put("multicast.enabled", false)
			.put("client.transport.sniff", true)
			.put("cluster.name", "my-foo-cluster")
			.put("discovery.zen.ping.unicast.hosts", "master.foo.bar:9300")
			.build();
	this.node = nodeBuilder().settings(settings).client(true).node();

Here is the master elastic search config:

	cluster.name: my-foo-cluster
	path.data: /var/data/elasticsearch
	path.logs: /var/log/elasticsearch
	bootstrap.mlockall: true

(fucema) #2

A followup to this problem.

I have tested the unicast discovery and it works fine when running the
client and master node on the same computer.

The above posted problem involves a master node and client running on
different subnets (hence, using unicast discovery from the client
node). The master node is bound to 192.168.110.xx subnet, and the
client data only node is on 192.168.109.xx subnet.

Again, prior to 0.17.1, the client node had no problem connecting to
the master. Nothing else that I can think of has changed, other than
updating the elastic jar to 0.17.1.

I just tested that unicast discovery works on the same subnet using a
client only node and a master node running on the same machine. The
client only node is configured for unicast ping discovery, and the
master node is using default network configuration.

Anyone else having this problem?

  • Seon

(fucema) #3

Ok, I enabled TRACE logging out on the discovery module and copied and pasted a portion of it to here:

https://gist.github.com/1105511

The client node (May Parker) is trying to connect to 192.168.109.138:9300 (the master node is Shuma-Gorath) yet it looks like a different node (Longshot) is responding instead.

Of course, Longshot is on a different cluster name so it May Parker should not be joining it.

I can't decipher the logs properly, but it seems like the presence of Long Shot (on the same subnet as May Parker but assigned a different cluster name) is interfering with May Parker joining Shuma-Gorath.

I don't have the time now, but I'm guessing that if I shutdown Longshot everything will work as expected.


(fucema) #4

Here is additional output, captured from the start of the node initialization.

https://gist.github.com/1105530

In this sample, the client only node is named [Bes]. The target master that Bes is trying to connect to should be [Shuma-Gorath]. Again, [Longshot] seems to be getting in the way of success.

Any ideas?


(Shay Banon) #5

Heya, first, the logic did not change between 0.17.1 and 0.17.0, but, after
analyzing the output you provided, I think I found the problem. It does rely
on ordering of events, so its not really something that is consistent and
you might just started to see it.

Here is the issue:
https://github.com/elasticsearch/elasticsearch/issues/1159, and will be part
of 0.17.2 (which will be released either today or tomorrow).

On Tue, Jul 26, 2011 at 2:27 AM, fucema fucema@gmail.com wrote:

Here is additional output, captured from the start of the node
initialization.

https://gist.github.com/1105530 https://gist.github.com/1105530

In this sample, the client only node is named [Bes]. The target master that
Bes is trying to connect to should be [Shuma-Gorath]. Again, [Longshot]
seems to be getting in the way of success.

Any ideas?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Different-behavior-in-unicast-between-ver-17-1-and-17-0-tp3198574p3199038.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(fucema) #6

Thanks Kimchy, you rock. Glad to know it wasn't user error this time. :wink:


(system) #7