(ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed


(deer) #1

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.
Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.
Client :
2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :
[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "clustername")
.put("client.transport.sniff", true)
.build();
TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is 10.1.4.196.

For the cluster setting, all setting is using the default except the the discovery.zen.minimum_master_nodes
which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

It looks like a networking issue, at least based on "No route to host" in
the error.
Can you ping the master when this is happening, what about doing a telnet
test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui dannyhui1103@gmail.com wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.
Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.
Client :
2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :
[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")
            .put("client.transport.sniff", true)

            .build();

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is 10.1.4.196.

For the cluster setting, all setting is using the default except the the discovery.zen.minimum_master_nodes
which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ahrjC1y7iDzQpb4UvrOnDbMRYnW%3D-u78TTBGLSwBc3Ow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(deer) #3

Hi Mark,

Thanks for replying.

The master (10.1.4.197) and other nodes can be reached while the problem
node(10.1.4.196) is not reachable.
So, we can see the cluster status at that moment

"status" : "yellow",
"timed_out" : false,
"unassigned_shards" : 0,

On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:

It looks like a networking issue, at least based on "No route to host" in
the error.
Can you ping the master when this is happening, what about doing a telnet
test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui <dannyh...@gmail.com <javascript:>> wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.

Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.

Client :

2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :

[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")

            .put("client.transport.sniff", true)


            .build();
    

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is 10.1.4.196.

For the cluster setting, all setting is using the default except the the discovery.zen.minimum_master_nodes
which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fe322bdb-2726-4979-80d1-bb2f7372f28f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Dome.C.Wei) #4

That must be the service not open.

在 2014年3月13日星期四UTC+8下午2时10分22秒,Hui写道:

Hi Mark,

Thanks for replying.

The master (10.1.4.197) and other nodes can be reached while the problem
node(10.1.4.196) is not reachable.
So, we can see the cluster status at that moment

"status" : "yellow",
"timed_out" : false,
"unassigned_shards" : 0,

On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:

It looks like a networking issue, at least based on "No route to host" in
the error.
Can you ping the master when this is happening, what about doing a telnet
test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui dannyh...@gmail.com wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.

Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.

Client :

2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :

[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")

            .put("client.transport.sniff", true)


            .build();
    

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is 10.1.4.196.

For the cluster setting, all setting is using the default except the the discovery.zen.minimum_master_nodes
which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a24efac4-f61d-4aa9-913c-bf11eba2735f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(deer) #5

Hi Dome,

Do you mean the service of 10.1.4.196 is not open? Yes, the service should
be stopped when it was rebooted.

But the master node 10.1.4.197 has removed the problem node 10.1.4.196 when
it cannot ping the machine 10.1.4.196.

The cluster should be fine after this operation. Do I understand it wrongly?

Thanks

On Thursday, March 13, 2014 4:48:17 PM UTC+8, Dome.C.Wei wrote:

That must be the service not open.

在 2014年3月13日星期四UTC+8下午2时10分22秒,Hui写道:

Hi Mark,

Thanks for replying.

The master (10.1.4.197) and other nodes can be reached while the problem
node(10.1.4.196) is not reachable.
So, we can see the cluster status at that moment

"status" : "yellow",
"timed_out" : false,
"unassigned_shards" : 0,

On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:

It looks like a networking issue, at least based on "No route to host"
in the error.
Can you ping the master when this is happening, what about doing a
telnet test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui dannyh...@gmail.com wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.

Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.

Client :

2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :

[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")

            .put("client.transport.sniff", true)


            .build();
    

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is
10.1.4.196.

For the cluster setting, all setting is using the default except the
the discovery.zen.minimum_master_nodes which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f6ec6ac-ac8b-4a09-b338-2d8c6e225777%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(echin1999-2) #6

Hi.
that would be my assumption as well. By the way, I am getting this same
warning that you are getting. Very similar scenario (2 nodes in a cluster

  • all works fine when everything is running. Warning appears on client if
    one of the nodes is taken down).

I am using v. 0.90 - not sure if that matters.

On Thursday, March 13, 2014 5:15:36 AM UTC-4, Hui wrote:

Hi Dome,

Do you mean the service of 10.1.4.196 is not open? Yes, the service should
be stopped when it was rebooted.

But the master node 10.1.4.197 has removed the problem node 10.1.4.196
when it cannot ping the machine 10.1.4.196.

The cluster should be fine after this operation. Do I understand it
wrongly?

Thanks

On Thursday, March 13, 2014 4:48:17 PM UTC+8, Dome.C.Wei wrote:

That must be the service not open.

在 2014年3月13日星期四UTC+8下午2时10分22秒,Hui写道:

Hi Mark,

Thanks for replying.

The master (10.1.4.197) and other nodes can be reached while the problem
node(10.1.4.196) is not reachable.
So, we can see the cluster status at that moment

"status" : "yellow",
"timed_out" : false,
"unassigned_shards" : 0,

On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:

It looks like a networking issue, at least based on "No route to host"
in the error.
Can you ping the master when this is happening, what about doing a
telnet test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui dannyh...@gmail.com wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.

Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.

Client :

2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :

[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")

            .put("client.transport.sniff", true)


            .build();
    

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is
10.1.4.196.

For the cluster setting, all setting is using the default except the
the discovery.zen.minimum_master_nodes which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7485d1d2-9f1b-4fdd-bea8-d15d1b542904%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(echin1999-2) #7

One more thing - I notice that functionally, the client is still able to
communicate to the remaining active node. so I guess this warning is just
a "warning". must be some background thread that periodically looks for
the missing node, while the main Client instance can still communicate to
the active node. would you be able to verify if its merely a warning for
you? if so, i might just not worry about this for now.

On Thursday, March 13, 2014 5:15:36 AM UTC-4, Hui wrote:

Hi Dome,

Do you mean the service of 10.1.4.196 is not open? Yes, the service should
be stopped when it was rebooted.

But the master node 10.1.4.197 has removed the problem node 10.1.4.196
when it cannot ping the machine 10.1.4.196.

The cluster should be fine after this operation. Do I understand it
wrongly?

Thanks

On Thursday, March 13, 2014 4:48:17 PM UTC+8, Dome.C.Wei wrote:

That must be the service not open.

在 2014年3月13日星期四UTC+8下午2时10分22秒,Hui写道:

Hi Mark,

Thanks for replying.

The master (10.1.4.197) and other nodes can be reached while the problem
node(10.1.4.196) is not reachable.
So, we can see the cluster status at that moment

"status" : "yellow",
"timed_out" : false,
"unassigned_shards" : 0,

On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:

It looks like a networking issue, at least based on "No route to host"
in the error.
Can you ping the master when this is happening, what about doing a
telnet test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui dannyh...@gmail.com wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.

Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.

Client :

2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :

[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")

            .put("client.transport.sniff", true)


            .build();
    

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is
10.1.4.196.

For the cluster setting, all setting is using the default except the
the discovery.zen.minimum_master_nodes which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a7d6c2d7-3f07-44cc-a15f-dc040c885508%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(deer) #8

Hi Echin,

Since the problem node ip is defined in the client es connection by JAVA
API, I guess the client will still try to connect to this node. So, there
are such warnings.

It should be fine for client to keep working with the cluster. However, in
my case, the java client is not reachable and timeout(through HTTP
protocol).

I will try to create a testing cluster with same settings to test does the
client work fine in this condition.

Thanks.

On Thursday, March 13, 2014 11:54:53 PM UTC+8, echin1999 wrote:

One more thing - I notice that functionally, the client is still able to
communicate to the remaining active node. so I guess this warning is just
a "warning". must be some background thread that periodically looks for
the missing node, while the main Client instance can still communicate to
the active node. would you be able to verify if its merely a warning for
you? if so, i might just not worry about this for now.

On Thursday, March 13, 2014 5:15:36 AM UTC-4, Hui wrote:

Hi Dome,

Do you mean the service of 10.1.4.196 is not open? Yes, the service
should be stopped when it was rebooted.

But the master node 10.1.4.197 has removed the problem node 10.1.4.196
when it cannot ping the machine 10.1.4.196.

The cluster should be fine after this operation. Do I understand it
wrongly?

Thanks

On Thursday, March 13, 2014 4:48:17 PM UTC+8, Dome.C.Wei wrote:

That must be the service not open.

在 2014年3月13日星期四UTC+8下午2时10分22秒,Hui写道:

Hi Mark,

Thanks for replying.

The master (10.1.4.197) and other nodes can be reached while the
problem node(10.1.4.196) is not reachable.
So, we can see the cluster status at that moment

"status" : "yellow",
"timed_out" : false,
"unassigned_shards" : 0,

On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:

It looks like a networking issue, at least based on "No route to host"
in the error.
Can you ping the master when this is happening, what about doing a
telnet test?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 13 March 2014 16:54, Hui dannyh...@gmail.com wrote:

Hi All,

This is the log for the case.

The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client keeps trying to connect to the elasticsearch cluster but fails.

Master Node :
[2014-03-08 14:08:26,531][INFO ][cluster.service ] [10.1.4.197:9202] removed {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],},
reason: zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]), reason failed to ping, tried [3] times, each with maximum [30s] timeout

Client :
2014-03-08 14:15:36,184 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], closing connection
java.net.NoRouteToHostException: No route to host

(The cluster health at this moment is Yellow and there is no unassigned shard.)

The node is back at 14:25, the client can successfully connected to the cluster again.

Client :

2014-03-08 14:25:20,597 WARN org.elasticsearch.transport.netty - [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], closing connection
java.net.NoRouteToHostException: No route to host

Master Node :

[2014-03-08 14:25:57,984][INFO ][cluster.service ] [10.1.4.197:9202] added {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, reason: zen-disco-receive(join from node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])

(The cluster health at this moment is Green.)

In the above case, the client should be able to connect to the cluster even a node is removed from the cluster.

For the client, the connection is created as followings :

    Settings settings = ImmutableSettings.settingsBuilder()
            .put("cluster.name", "clustername")

            .put("client.transport.sniff", true)


            .build();
    

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress(
            "10.1.4.195" /* hostname */, 9300 /* port */));

    client.addTransportAddress(new InetSocketTransportAddress(

"10.1.4.196" /* hostname /, 9300 / port /));
client.addTransportAddress(new InetSocketTransportAddress(
"10.1.4.197" /
hostname /, 9300 / port */));

The master node is 10.1.4.197 while the node being removed is
10.1.4.196.

For the cluster setting, all setting is using the default except the
the discovery.zen.minimum_master_nodes which is set to 3.

Is there any problem for the above setting which cause this issue?

Thanks.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8973948-06fb-4675-a257-b28bbb09242d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(deer) #9

Hi All,

After testing in another cluster, I found that the cluster can be connected
but it was very slow.

At this moment, every normal request(~50ms) becomes 41732ms to 85984ms
while the cluster is in Yellow health and there is no unassigned shard(s).

It becomes 50ms again after the problem node re-joins.

There is no exception log in the master node.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/840c93bb-f62a-4a65-9220-c725918436f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(deer) #10

Sorry All.

I've verified that the slow problem is related to the
reason failed to ping, tried [3] times, each with maximum [30s] timeout

Thanks.

On Friday, March 14, 2014 12:02:36 PM UTC+8, Hui wrote:

Hi All,

After testing in another cluster, I found that the cluster can be
connected but it was very slow.

At this moment, every normal request(~50ms) becomes 41732ms to 85984ms
while the cluster is in Yellow health and there is no unassigned shard(s).

It becomes 50ms again after the problem node re-joins.

There is no exception log in the master node.

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f5a7739-c2b5-43cf-b040-346002d7281f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11