I greped for "removed" in master node and these are the logs that I see.
[2015-04-01 05:32:55,813][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-43-55][inet[/153.31.73.55:9300]],},
reason:
zen-disco-node_failed([ES30GBNode2][Yf8ODQh0TE2_0hQ35Y0M_w][ip-153-31-73-55][inet[/153.31.73.55:9300]]),
reason transport disconnected
[2015-04-01 05:33:02,048][INFO ][cluster.service ] [ESBigNode3]
removed
{[ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-76-111][inet[/153.31.76.111:9300]],},
reason:
zen-disco-node_failed([ES30GBNode1][0CRaC261RXy8JfGc1XNLZA][ip-153-31-36-101][inet[/153.31.36.101:9300]]),
reason transport disconnected
[2015-04-01 05:33:09,702][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]],},
reason:
zen-disco-node_failed([ESBigNode5][PaNaDPwfSM-jUpGa8HQJmQ][esnode5][inet[/153.31.70.128:9300]]),
reason transport disconnected
[2015-04-01 05:33:13,964][INFO ][cluster.service ] [ESBigNode3]
removed
{[ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.75.190:9300]],},
reason:
zen-disco-node_failed([ESBigNode1][ihJU17ToQVit9BxNzQjhnQ][esnode1][inet[/153.31.35.190:9300]]),
reason transport disconnected
And in the data node, this is how the node leaving the cluster looks like
in its log files.
[2015-01-22 20:49:56,860][WARN ][discovery.ec2 ] [ESBigNode1]
master left (reason = do not exists on master, act as master failure),
current nodes:
{[ESBigNode2][zVdCNza9Qk-v-Usu66jcvw][ip-153-31-73-29][inet[/153.31.73.29:9300]],[ESBigNode4][-8pj8n2sS5GB4XTIE0zudQ][ip-153-31-74-230][inet[/153.31.74.230:9300]],[ESBigNode1][nU6bkV-SSb6rvLHsth9AQg][ip-153-31-75-190][inet[/153.31.75.190:9300]],}
That is 4 nodes leaving the 7 node cluster at at time.. and the cluster is
in red state for few minutes, not just yellow state..
Although 4 nodes leaving the cluster is rare.. Single nodes leave the
cluster very often.
As discussed in this
thread, https://groups.google.com/forum/#!msg/elasticsearch/ixoAF9Yur0E/CgX4Hbk1ynYJ
I will change the discovery.zen.ping.timeout to 10sec, what else can I do.
there is an older thread from 2012 that also suggests to change OS settings
that deal with ipv4 TCP keep alive settings.. Do I also have to change this
setting? https://groups.google.com/forum/#!msg/elasticsearch/c9JmaiVfBb0/9XZM6ZJpoBwJ
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/727d0b5f-1dbf-4ce6-ab11-067b20513c76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.