Transport Client unable to resolve hostname in certain cases

I been having an issue where my transport client is able to communicate
okay with elasticsearch on EC2, however soon as I rebuild the elasticsearch
box and the ip changes, the transportclient throws no node available
exception. It seems to keep the old ip and not resolve the host to
the latest ip. I am using hostname and not an actual ip when adding
transport address. Am I not setting something right, or is this a bug in
the transport client?

Here is how i am setting up the transport client:

Settings settings = ImmutableSettings.settingsBuilder()
.put("client.transport.sniff", false)
.put("client.transport.ignore_cluster_name", true).build();
transportClient = new TransportClient(settings);
String[] servers = clusterConfiguration.getServers();
if (servers != null) {
for (String server : servers) {
transportClient.addTransportAddress(new InetSocketTransportAddress(server,
clusterConfiguration.getPort()));
}
}

This is the exception i am getting when elasticsearch ip changes. Just to
make sure the box is resolving the hostname, I also did ping the host on
the same box and see if it picked up the lastest ip.

21:17:36.007 [elasticsearch[Time Bomb][generic][T#1]] DEBUG
org.elasticsearch.client.transport - [Time Bomb] failed to connect to node
[[#transport#-1][inet[elasticsearch-t1/10.60.95.159:9300]]], removed from
nodes list
org.elasticsearch.transport.ConnectTransportException:
[][inet[elasticsearch-t1/10.60.95.159:9300]] connect_timeout[30s]
at
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:665)
~[elasticsearch-0.20.5.jar:na]
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:604)
~[elasticsearch-0.20.5.jar:na]
at
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:574)
~[elasticsearch-0.20.5.jar:na]
at
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:127)
~[elasticsearch-0.20.5.jar:na]
at
org.elasticsearch.client.transport.TransportClientNodesService$SimpleNodeSampler.sample(TransportClientNodesService.java:302)
~[elasticsearch-0.20.5.jar:na]
at
org.elasticsearch.client.transport.TransportClientNodesService$ScheduledNodeSampler.run(TransportClientNodesService.java:281)
[elasticsearch-0.20.5.jar:na]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_15]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_15]
at java.lang.Thread.run(Thread.java:722) [na:1.7.0_15]

[root@ip-10-110-81-210 ~]# ping elasticsearch-t1
PING elasticsearch-t1.d.simcloud.com (10.119.98.28) 56(84) bytes of data.
64 bytes from elasticsearch-t1.d.simcloud.com (10.119.98.28): icmp_seq=1
ttl=57 time=341 ms
64 bytes from elasticsearch-t1.d.simcloud.com (10.119.98.28): icmp_seq=2
ttl=57 time=1.61 ms

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Being removed from node list may be due to the fact the node is not
addressing the correct cluster name. I saw you uses
"client.transport.ignore_cluster_name" = true. Note, this does disable
cluster name validation so you can start a transport client and try to
detect different clusters, but it does not disable client acceptance at
cluster side.

Jörg

Am 02.04.2013 02:30, schrieb ElasticRook:

[Time Bomb] failed to connect to node
[[#transport#-1][inet[elasticsearch-t1/10.60.95.159:9300]]], removed
from nodes list
org.elasticsearch.transport.ConnectTransportException:
[inet[elasticsearch-t1/10.60.95.159:9300]] connect_timeout[30s]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

What you observe here is due to your setting "client.transport.sniff" =
false. If you disable sniffing, the TransportClient will not try to look
for other nodes of the cluster.

Jörg

Am 02.04.2013 02:30, schrieb ElasticRook:

I been having an issue where my transport client is able to
communicate okay with elasticsearch on EC2, however soon as I rebuild
the elasticsearch box and the ip changes, the transportclient throws
no node available exception.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Well, I currently only have one node in the cluster, so i kept the sniff
off. I think the problem is more of the transport client keeping the old ip
and not resolving to the new one when ip is changed on the node box. I had
to restart the service that uses the transport client for it to resolve the
hostname again and get the latest ip.

On Tuesday, April 2, 2013 1:01:48 AM UTC-7, Jörg Prante wrote:

What you observe here is due to your setting "client.transport.sniff" =
false. If you disable sniffing, the TransportClient will not try to look
for other nodes of the cluster.

Jörg

Am 02.04.2013 02:30, schrieb ElasticRook:

I been having an issue where my transport client is able to
communicate okay with elasticsearch on EC2, however soon as I rebuild
the elasticsearch box and the ip changes, the transportclient throws
no node available exception.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For failover, you need two data nodes your client is connected to. If
you have only one node, a client connection may fail without being able
to recover.

Jörg

Am 02.04.13 18:23, schrieb ElasticRook:

Well, I currently only have one node in the cluster, so i kept the
sniff off. I think the problem is more of the transport client keeping
the old ip and not resolving to the new one when ip is changed on the
node box. I had to restart the service that uses the transport client
for it to resolve the hostname again and get the latest ip.

On Tuesday, April 2, 2013 1:01:48 AM UTC-7, Jörg Prante wrote:

What you observe here is due to your setting
"client.transport.sniff" =
false. If you disable sniffing, the TransportClient will not try
to look
for other nodes of the cluster.

Jörg

Am 02.04.2013 02:30, schrieb ElasticRook:
> I been having an issue where my transport client is able to
> communicate okay with elasticsearch on EC2, however soon as I
rebuild
> the elasticsearch box and the ip changes, the transportclient
throws
> no node available exception.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for responding. Yes, I agree. I will be adding new nodes soon. In
that case, I will turn on the sniff. However that will not solve my
original issue.

On Tuesday, April 2, 2013 9:43:02 AM UTC-7, Jörg Prante wrote:

For failover, you need two data nodes your client is connected to. If
you have only one node, a client connection may fail without being able
to recover.

Jörg

Am 02.04.13 18:23, schrieb ElasticRook:

Well, I currently only have one node in the cluster, so i kept the
sniff off. I think the problem is more of the transport client keeping
the old ip and not resolving to the new one when ip is changed on the
node box. I had to restart the service that uses the transport client
for it to resolve the hostname again and get the latest ip.

On Tuesday, April 2, 2013 1:01:48 AM UTC-7, Jörg Prante wrote:

What you observe here is due to your setting 
"client.transport.sniff" = 
false. If you disable sniffing, the TransportClient will not try 
to look 
for other nodes of the cluster. 

Jörg 

Am 02.04.2013 02:30, schrieb ElasticRook: 
> I been having an issue where my transport client is able to 
> communicate okay with elasticsearch on EC2, however soon as I 
rebuild 
> the elasticsearch box and the ip changes, the transportclient 
throws 
> no node available exception. 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The TransportClient maintains a persisting connection to one or more
nodes of a cluster. But only by using actively such a connection, the
system can get knowledge about whether the connection is valid or not
(and it times out or switches over to another connection if not usable
any more).

Note, a transport client does not store or receive the cluster state and
has no knowledge of the internal network state of the cluster currently
connected to. If you change the cluster network state, you can't expect
that a transport client is capable of tracking it. A TransportClient is
not directly attached to the cluster (for instance, it is invisble to
other transport clients).

If you want tighter client integration, you can use a node client, which
is aware of the current cluster network state. Well, you might see error
messages appearing more earlier in the log when connections are getting
unusable, but you will also notice also client reconnects, I'm quite sure.

Jörg

Am 02.04.13 19:12, schrieb ElasticRook:

Thanks for responding. Yes, I agree. I will be adding new nodes soon.
In that case, I will turn on the sniff. However that will not solve my
original issue.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jörg,

I haven't had much of a chance to explore multi-node clusters in a
heavy-usage fail-over environment. I use static discovery, and give each
node and my TransportClient the list of all of the IP addresses. So when
one of the nodes goes off-line, the TransportClient still has other nodes
it can use and fail-over seems to work just fine.

Now I'm reading the most recent ES documentation at
Elasticsearch Platform — Find real-time answers at scale | Elastic,
and a few things aren't clear to me.

  1. Can I (indeed, should I) create just one NodeClient that is shared by
    all threads within an application? This is the current usage of the
    TransportClient, and migration of my code would be easier if the same usage
    pattern (one NodeClient shared by all threads) was acceptable.

  2. Since I can't pass a list of addresses, can I pass in the list of two or
    more node addresses via the Java command line via
    -Des.discovery.zen.ping.unicast.hosts=$HOSTS (as I do when I start ES
    itself, since I don't have a per-installation unique elasticsearch.yml
    file)?

  3. And if the NodeClient is on the same localhost as the one ES server data
    node (which is the case on my laptop and on several of our in-house QA
    systems), does passing in the cluster name also look on the local host?
    (When I create a TransportClient, I give it the one localhost address).

Thanks for any insights and corrections you can give me!

Regards,
Brian

On Tuesday, April 2, 2013 4:08:05 PM UTC-4, Jörg Prante wrote:

The TransportClient maintains a persisting connection to one or more
nodes of a cluster. But only by using actively such a connection, the
system can get knowledge about whether the connection is valid or not
(and it times out or switches over to another connection if not usable
any more).

Note, a transport client does not store or receive the cluster state and
has no knowledge of the internal network state of the cluster currently
connected to. If you change the cluster network state, you can't expect
that a transport client is capable of tracking it. A TransportClient is
not directly attached to the cluster (for instance, it is invisble to
other transport clients).

If you want tighter client integration, you can use a node client, which
is aware of the current cluster network state. Well, you might see error
messages appearing more earlier in the log when connections are getting
unusable, but you will also notice also client reconnects, I'm quite sure.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5bf2be2-8b5f-463d-bc85-7a8edb5cb55b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. Yes, a NodeClient can be used by many threads.

  2. Absolutely, a NodeClient is in fact based on an ordinary node behind the
    scenes, with all the bells and whistles of discovery, configuration,
    logging, settings ...

  3. You can configure a NodeClient as you would with a data node (see
    "network.*" settings), and because : (all interfaces) is the default, I
    think localhost will also be used if you do not configure the NodeClient at
    all. At least, the hostname IP is used. There may be in some cases
    JVM-related issues in hostname -> IP resolving if /etc/hosts is messed up
    and both localhost and hostname points to 127.0.0.1, but this is not ES
    specific.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEXKZAQhspJ2Xe2m3DEVLjp825fBAvz7%2BSAS9XV06EuKw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jörg,

Thank you very much for the quick response. I've updated all of my servers
and command-line drivers to accept the Client interface instead of the
TransportClient object, and then in one place I can optionally create
either a TransportClient or a NodeClient and then pass its reference along
as a Client.

I noticed (via TRACE-level logging) that when I create a NodeClient but
don't configure it at all, it goes into zen multicast discovery on
localhost. That works fine.

Then I updated my Java code to configure the NodeClient as follows:

ImmutableSettings.Builder settingsBuilder =
ImmutableSettings.settingsBuilder();
settingsBuilder.put("discovery.zen.ping.multicast.enabled", false);
settingsBuilder.put("discovery.zen.ping.unicast.hosts",
hostNamesToString());
Node node =
nodeBuilder().clusterName(clusterName).client(true).settings(settingsBuilder).node();
client = node.client();

This works also (where hostNamesToString converts an array of host names to
a comma-separated string). And I finally resurrected my 3-host cluster to
verify this. But I now have some additional questions:

  1. With static discovery, if I just specify one of the host names of the
    3-host cluster, will NodeClient discover the other two hosts? Or does it
    work exactly like a data node in which it must know about all of the other
    nodes when zen multicast is disabled?

  2. Zen multicast is disabled for my data nodes due to recommendations on
    this newsgroup as part of the strategy for avoiding split-brain situations.
    But for this client-only non-data NodeClient, would you recommend leaving
    zen multicast on and letting it dynamically find the hosts that are running
    that cluster?

Thanks so much for your insights and recommendations!

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/aeee97bd-4a26-4160-b801-dfde88bbce06%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jörg,

By the way, when I said that multicast worked, it worked when running on my
laptop against the locally-running single-host cluster.

When I enable Zen multicast, with our without a host list, I see a repeated
set of log entries in which no master node can be found. For example:

TRACE org.elasticsearch.discovery.zen - [Jarvis, Edwin] no masterNode
returned

Regards,
Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/219c16f8-a54f-40bf-a0ec-e3f2099206f7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. Yes, it is enough for discovery to detect even only a single member of
    the cluster to join the cluster. You do not need to specify all the cluster
    members. A few members are better, to ensure discovery always succeeds.

  2. Zen discovery and split-brain situations are two different topics. In
    most cases, if only nodes go down and up and the network is stable, Zen
    discovery works fine. The assumption is that per multicast request, all
    nodes can "see" this event and react properly. But, if network connections
    fail, while nodes stay up and continue to run, the cluster may enter a
    so-called byzantine situation. A bad thing is a 50%:50% split. In such
    case, nodes would have to agree about how to continue, without knowing what
    the other half of nodes currently do. To avoid this, the total cluster node
    count should be odd, not even, and the setting
    discovery.zen.minimum_master_nodes
    should be set to a high number, at least half the cluster node count plus
    one. By doing that, if just a few number of nodes split, they are told to
    not elect a new master, and can safely rejoin the cluster again later.

The message "no masterNode returned" is a trace level message and is just
informational. The node client discovery had not found a master node in the
nodes that answered to the multicast request (if there were nodes at all),
so the next step is going to elect a new master.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvTWCX2CEeTA2np5XQ_T%2BhvAGfKTW%3DM%3DRZE9ehKD0baw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.