Nodes randomly disconnected from the ES cluster

Anil_Karaka · April 1, 2015, 6:45am

I greped for "removed" in master node and these are the logs that I see.

[2015-04-01 05:32:55,813][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-43-55][inet[/153.31.73.55:9300]],},
reason:
zen-disco-node_failed([ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-73-55][inet[/153.31.73.55:9300]]),
reason transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-76-111][inet[/153.31.76.111:9300]],},
reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-36-101][inet[/153.31.36.101:9300]]),
reason transport disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]],},
reason:
zen-disco-node_failed([ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]),
reason transport disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.75.190:9300]],},
reason:
zen-disco-node_failed([ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]),
reason transport disconnected

And in the data node, this is how the node leaving the cluster looks like
in its log files.

[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ] [ESBigNode1]
master left (reason = do not exists on master, act as master failure),
current nodes:
{[ESBigNode2][zVdCNza9Qk-v-Usu66jcvw][ip-153-31-73-29][inet[/153.31.73.29:9300]],[ESBigNode4][-8pj8n2sS5GB4XTIE0zudQ][ip-153-31-74-230][inet[/153.31.74.230:9300]],[ESBigNode1][nU6bkV-SSb6rvLHsth9AQg][ip-153-31-75-190][inet[/153.31.75.190:9300]],}

That is 4 nodes leaving the 7 node cluster at at time.. and the cluster is
in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.

As discussed in this
thread, https://groups.google.com/forum/#!msg/elasticsearch/ixoAF9Yur0E/CgX4Hbk1ynYJ
I will change the discovery.zen.ping.timeout to 10sec, what else can I do.

there is an older thread from 2012 that also suggests to change OS settings
that deal with ipv4 TCP keep alive settings.. Do I also have to change this
setting? https://groups.google.com/forum/#!msg/elasticsearch/c9JmaiVfBb0/9XZM6ZJpoBwJ

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/727d0b5f-1dbf-4ce6-ab11-067b20513c76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anil_Karaka · April 1, 2015, 7:13am

I also use amazon, aws cloud plugin and discover my nodes based on the
security group..

should I instead change it to unicast discovery?

On Wednesday, April 1, 2015 at 12:15:41 PM UTC+5:30, Anil Karaka wrote:

I greped for "removed" in master node and these are the logs that I see.

[2015-04-01 05:32:55,813][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-43-55][inet[/153.31.73.55:9300]],},
reason:
zen-disco-node_failed([ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-73-55][inet[/153.31.73.55:9300]]),
reason transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-76-111][inet[/153.31.76.111:9300]],},
reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-36-101][inet[/153.31.36.101:9300]]),
reason transport disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]],},
reason:
zen-disco-node_failed([ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]),
reason transport disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.75.190:9300]],},
reason:
zen-disco-node_failed([ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]),
reason transport disconnected

And in the data node, this is how the node leaving the cluster looks like
in its log files.

[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ] [ESBigNode1]
master left (reason = do not exists on master, act as master failure),
current nodes:
{[ESBigNode2][zVdCNza9Qk-v-Usu66jcvw][ip-153-31-73-29][inet[/153.31.73.29:9300]],[ESBigNode4][-8pj8n2sS5GB4XTIE0zudQ][ip-153-31-74-230][inet[/153.31.74.230:9300]],[ESBigNode1][nU6bkV-SSb6rvLHsth9AQg][ip-153-31-75-190][inet[/153.31.75.190:9300]],}

That is 4 nodes leaving the 7 node cluster at at time.. and the cluster is
in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.

As discussed in this thread,
Redirecting to Google Groups
I will change the discovery.zen.ping.timeout to 10sec, what else can I do.

there is an older thread from 2012 that also suggests to change OS
settings that deal with ipv4 TCP keep alive settings.. Do I also have to
change this setting?
Redirecting to Google Groups

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26c6828f-30ba-4262-93cb-7650b1dad64c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anil_Karaka · April 1, 2015, 7:30am

[2015-04-01 07:23:09,550][INFO ][cluster.service ] [ESBigNode5]
removed
{[ESBigNode3][X_IyUwkrQe-ae15VVKltDw][esnode3][inet[/153.31.73.30:9300]],},
reason: zen-disco-master_failed
([ESBigNode3][X_IyUwkrQe-ae15VVKltDw][esnode3][inet[/153.31.73.30:9300]])
[2015-04-01 07:24:20,456][INFO ][cluster.service ] [ESBigNode5]
detected_master
[ESBigNode3][X_IyUwkrQe-ae15VVKltDw][esnode3][inet[/153.31.73.30:9300]],
added
{[ESBigNode3][X_IyUwkrQe-ae15VVKltDw][esnode3][inet[/153.31.73.30:9300]],},
reason: zen-disco-receive(from master
[[ESBigNode3][X_IyUwkrQe-ae15VVKltDw][esnode3][inet[/153.31.73.30:9300]]])

This is how node leaves and rejoins the cluster.

On Wednesday, April 1, 2015 at 12:43:35 PM UTC+5:30, Anil Karaka wrote:

I also use amazon, aws cloud plugin and discover my nodes based on the
security group..

should I instead change it to unicast discovery?

On Wednesday, April 1, 2015 at 12:15:41 PM UTC+5:30, Anil Karaka wrote:

I greped for "removed" in master node and these are the logs that I see.

[2015-04-01 05:32:55,813][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-43-55][inet[/153.31.73.55:9300]],},
reason:
zen-disco-node_failed([ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-73-55][inet[/153.31.73.55:9300]]),
reason transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-76-111][inet[/153.31.76.111:9300]],},
reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-36-101][inet[/153.31.36.101:9300]]),
reason transport disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]],},
reason:
zen-disco-node_failed([ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]),
reason transport disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.75.190:9300]],},
reason:
zen-disco-node_failed([ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]),
reason transport disconnected

And in the data node, this is how the node leaving the cluster looks like
in its log files.

[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ] [ESBigNode1]
master left (reason = do not exists on master, act as master failure),
current nodes:
{[ESBigNode2][zVdCNza9Qk-v-Usu66jcvw][ip-153-31-73-29][inet[/153.31.73.29:9300]],[ESBigNode4][-8pj8n2sS5GB4XTIE0zudQ][ip-153-31-74-230][inet[/153.31.74.230:9300]],[ESBigNode1][nU6bkV-SSb6rvLHsth9AQg][ip-153-31-75-190][inet[/153.31.75.190:9300]],}

That is 4 nodes leaving the 7 node cluster at at time.. and the cluster
is in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.

As discussed in this thread,
Redirecting to Google Groups
I will change the discovery.zen.ping.timeout to 10sec, what else can I do.

there is an older thread from 2012 that also suggests to change OS
settings that deal with ipv4 TCP keep alive settings.. Do I also have to
change this setting?
Redirecting to Google Groups

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/93db58e4-dda7-4b8f-bf16-cd18b27bfba1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tomer_Levy · April 7, 2015, 5:33am

We're experiencing a similar issue with one of our clusters on EC2 which
was running 1.4.4 and it still happens after upgrade 1.5.0. We see "Master
left" messages randomly happen and then reconnect after a couple of
minutes. We have 4 data nodes and 3 master nodes (and a few client nodes).

master left (reason = do not exists on master, act as master failure)

Any thoughts?

On Wednesday, April 1, 2015 at 10:13:35 AM UTC+3, Anil Karaka wrote:

I also use amazon, aws cloud plugin and discover my nodes based on the
security group..

should I instead change it to unicast discovery?

On Wednesday, April 1, 2015 at 12:15:41 PM UTC+5:30, Anil Karaka wrote:

I greped for "removed" in master node and these are the logs that I see.

[2015-04-01 05:32:55,813][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-43-55][inet[/153.31.73.55:9300]],},
reason:
zen-disco-node_failed([ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-73-55][inet[/153.31.73.55:9300]]),
reason transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-76-111][inet[/153.31.76.111:9300]],},
reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-36-101][inet[/153.31.36.101:9300]]),
reason transport disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]],},
reason:
zen-disco-node_failed([ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]),
reason transport disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.75.190:9300]],},
reason:
zen-disco-node_failed([ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]),
reason transport disconnected

And in the data node, this is how the node leaving the cluster looks like
in its log files.

[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ] [ESBigNode1]
master left (reason = do not exists on master, act as master failure),
current nodes:
{[ESBigNode2][zVdCNza9Qk-v-Usu66jcvw][ip-153-31-73-29][inet[/153.31.73.29:9300]],[ESBigNode4][-8pj8n2sS5GB4XTIE0zudQ][ip-153-31-74-230][inet[/153.31.74.230:9300]],[ESBigNode1][nU6bkV-SSb6rvLHsth9AQg][ip-153-31-75-190][inet[/153.31.75.190:9300]],}

That is 4 nodes leaving the 7 node cluster at at time.. and the cluster
is in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.

As discussed in this thread,
Redirecting to Google Groups
I will change the discovery.zen.ping.timeout to 10sec, what else can I do.

there is an older thread from 2012 that also suggests to change OS
settings that deal with ipv4 TCP keep alive settings.. Do I also have to
change this setting?
Redirecting to Google Groups

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1aed1011-c01b-4228-9e16-479330895c96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 7, 2015, 6:19am

When you see this, can you check if _cat/indices
,
_cat/shards
and
_cat/nodes
return a request?

On 7 April 2015 at 15:33, Tomer Levy tomer.levy9@gmail.com wrote:

We're experiencing a similar issue with one of our clusters on EC2 which
was running 1.4.4 and it still happens after upgrade 1.5.0. We see "Master
left" messages randomly happen and then reconnect after a couple of
minutes. We have 4 data nodes and 3 master nodes (and a few client nodes).

master left (reason = do not exists on master, act as master failure)

Any thoughts?

On Wednesday, April 1, 2015 at 10:13:35 AM UTC+3, Anil Karaka wrote:

I also use amazon, aws cloud plugin and discover my nodes based on the
security group..

should I instead change it to unicast discovery?

On Wednesday, April 1, 2015 at 12:15:41 PM UTC+5:30, Anil Karaka wrote:

I greped for "removed" in master node and these are the logs that I see.

[2015-04-01 05:32:55,813][INFO ][cluster.service ] [ESBigNode3]
removed {[ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-43-55][
inet[/153.31.73.55:9300]],}, reason: zen-disco-node_failed([
ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-73-55][
inet[/153.31.73.55:9300]]), reason transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ] [ESBigNode3]
removed {[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-
153-31-76-111][inet[/153.31.76.111:9300]],}, reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-
153-31-36-101][inet[/153.31.36.101:9300]]), reason transport
disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ] [ESBigNode3]
removed {[ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/
153.31.70.128:9300]],}, reason: zen-disco-node_failed([
ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]),
reason transport disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ] [ESBigNode3]
removed {[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/
153.31.75.190:9300]],}, reason: zen-disco-node_failed([ESBigNode1][
ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]), reason
transport disconnected

And in the data node, this is how the node leaving the cluster looks
like in its log files.

[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ] [ESBigNode1]
master left (reason = do not exists on master, act as master failure),
current nodes: {[ESBigNode2][zVdCNza9Qk-v-Usu66jcvw][ip-153-31-73-29][
inet[/153.31.73.29:9300]],[ESBigNode4][-8pj8n2sS5GB4XTIE0zudQ][ip-153-
31-74-230][inet[/153.31.74.230:9300]],[ESBigNode1][
nU6bkV-SSb6rvLHsth9AQg][ip-153-31-75-190][inet[/153.31.75.190:9300]],}

That is 4 nodes leaving the 7 node cluster at at time.. and the cluster
is in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.

As discussed in this thread, https://groups.google.
com/forum/#!msg/elasticsearch/ixoAF9Yur0E/CgX4Hbk1ynYJ I will change
the discovery.zen.ping.timeout to 10sec, what else can I do.

there is an older thread from 2012 that also suggests to change OS
settings that deal with ipv4 TCP keep alive settings.. Do I also have to
change this setting? Redirecting to Google Groups
elasticsearch/c9JmaiVfBb0/9XZM6ZJpoBwJ

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1aed1011-c01b-4228-9e16-479330895c96%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1aed1011-c01b-4228-9e16-479330895c96%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8ZXfsc8GU%3DogwTVPF-PkYdpq6kf6j%3DLTRQ5XuLQRWe%3Dg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Tomer_Levy · April 8, 2015, 9:33am

Link below seems like a good direction to solve the problem

[image: photo]
Tomer Levy
CEO, Co-Founder, Logz.io
p:+972-544235023 | e:tomer@logz.io | w:on.logz.io/1C2UlMi | a:
+1-617-314-3318
http://il.linkedin.com/pub/tomer-levy/1/950/360
http://twitter.com/tomerlevy
Get a signature like this:
http://ws-stats.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS8/dXRtX3NvdXJjZT1leHRlbnNpb24mdXRtX21lZGl1bT1lbWFpbCZ1dG1fY2FtcGFpZ249cHJvbW9fNDUiLCAiZSI6ICJwcm9tb180NV9jbGljayJ9
Click
here!
http://ws-stats.appspot.com/r?rdata=eyJydXJsIjogImh0dHA6Ly93d3cud2lzZXN0YW1wLmNvbS8/dXRtX3NvdXJjZT1leHRlbnNpb24mdXRtX21lZGl1bT1lbWFpbCZ1dG1fY2FtcGFpZ249cHJvbW9fNDUiLCAiZSI6ICJwcm9tb180NV9jbGljayJ9

On Tue, Apr 7, 2015 at 9:19 AM, Mark Walkom markwalkom@gmail.com wrote:

When you see this, can you check if _cat/indices
,
_cat/shards
and
_cat/nodes
return a request?

On 7 April 2015 at 15:33, Tomer Levy tomer.levy9@gmail.com wrote:

We're experiencing a similar issue with one of our clusters on EC2 which
was running 1.4.4 and it still happens after upgrade 1.5.0. We see "Master
left" messages randomly happen and then reconnect after a couple of
minutes. We have 4 data nodes and 3 master nodes (and a few client nodes).

master left (reason = do not exists on master, act as master failure)

Any thoughts?

On Wednesday, April 1, 2015 at 10:13:35 AM UTC+3, Anil Karaka wrote:

I also use amazon, aws cloud plugin and discover my nodes based on the
security group..

should I instead change it to unicast discovery?

On Wednesday, April 1, 2015 at 12:15:41 PM UTC+5:30, Anil Karaka wrote:

I greped for "removed" in master node and these are the logs that I see.

[2015-04-01 05:32:55,813][INFO ][cluster.service ]
[ESBigNode3] removed {[ES30GBNode2][Yf8ODQh0TE2_
0hQ35Y0M_w][ip-153-31-43-55][inet[/153.31.73.55:9300]],}, reason:
zen-disco-node_failed([ES30GBNode2][Yf8ODQh0TE2_
0hQ35Y0M_w][ip-153-31-73-55][inet[/153.31.73.55:9300]]), reason
transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ]
[ESBigNode3] removed {[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-
153-31-76-111][inet[/153.31.76.111:9300]],}, reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-
153-31-36-101][inet[/153.31.36.101:9300]]), reason transport
disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ]
[ESBigNode3] removed {[ESBigNode5][PaNaDPwfSM-
jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]],}, reason:
zen-disco-node_failed([ESBigNode5][PaNaDPwfSM-
jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]), reason transport
disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ]
[ESBigNode3] removed {[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][
esnode1][inet[/153.31.75.190:9300]],}, reason: zen-disco-node_failed([
ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]),
reason transport disconnected

And in the data node, this is how the node leaving the cluster looks
like in its log files.

[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ]
[ESBigNode1] master left (reason = do not exists on master, act as master
failure), current nodes: {[ESBigNode2][zVdCNza9Qk-v-
Usu66jcvw][ip-153-31-73-29][inet[/153.31.73.29:9300]],[ESBigNode4][-
8pj8n2sS5GB4XTIE0zudQ][ip-153-31-74-230][inet[/153.31.74.
230:9300]],[ESBigNode1][nU6bkV-SSb6rvLHsth9AQg][ip-
153-31-75-190][inet[/153.31.75.190:9300]],}

That is 4 nodes leaving the 7 node cluster at at time.. and the cluster
is in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.

As discussed in this thread, https://groups.google.
com/forum/#!msg/elasticsearch/ixoAF9Yur0E/CgX4Hbk1ynYJ I will change
the discovery.zen.ping.timeout to 10sec, what else can I do.

there is an older thread from 2012 that also suggests to change OS
settings that deal with ipv4 TCP keep alive settings.. Do I also have to
change this setting? Redirecting to Google Groups
elasticsearch/c9JmaiVfBb0/9XZM6ZJpoBwJ

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1aed1011-c01b-4228-9e16-479330895c96%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1aed1011-c01b-4228-9e16-479330895c96%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/E-aGhovVTPI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8ZXfsc8GU%3DogwTVPF-PkYdpq6kf6j%3DLTRQ5XuLQRWe%3Dg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8ZXfsc8GU%3DogwTVPF-PkYdpq6kf6j%3DLTRQ5XuLQRWe%3Dg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKf%2BfgcL5ZkreXRPArQ9fqNVTR6-bxp2FTPzsV61cdgziFg8jg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

rfo2006 · February 1, 2016, 7:44am

I think I'm facing the same issue - random disconnects, elastic cluster on EC2. Did you ever manage to solve this issue?

Mike_Salmon · June 29, 2017, 1:46pm

Anyone here manage to work out what this issue is? Having the same problem at the moment.

MichelZ · January 14, 2020, 6:39am

I'm experiencing a similar issue on EC2 with AWS Security Group discovery. While I've just started to look into it, this looks very similar to my issue

DavidTurner · January 14, 2020, 7:14am

This is a pretty old thread @MichelZ, would you start another fresh one if you want to discuss your issues?

Topic		Replies	Views
Nodes randomly, temporarily, leaving 7.3.2 cluster Elasticsearch	16	5135	April 3, 2020
Nodes disconnected randomly Elasticsearch painless	0	375	August 22, 2022
Frequent disconnects between nodes Elasticsearch	12	2413	August 19, 2013
Seeing Frequent NodeNotConnectedException errors Elasticsearch	3	12361	September 30, 2015
Node disconnecting randomly Elasticsearch	27	2623	April 1, 2021

Nodes randomly disconnected from the ES cluster

Related topics