Cluster troubles, Azure related?

Tim_Heikell · September 16, 2014, 6:00pm

We are prepping to launch our app into production and seem to be having
some stability issues. We have a cluster of 4 VMs on Azure that all use the
Azure plugin for discovery. Most of the time it works as expected, but
sometimes it looses its mind. This morning for example, I made adjustments
to the memory allocated to the JVM of all nodes. I rebooted all of the
nodes, one at a time, waiting for a green status before rebooting the next
node. When I rebooted the fourth node, the cluster status turned red (as
per node #1). Node 1 only reported that nodes 1 and 2 were in the cluster.
I waited and nothing changed. I eventually checked the node status on node
3 and found that nodes 3 and 4 had formed their own cluster. I ended up in
a state where nodes 1 and 2 were in a cluster, with 2 being the master,
while 3 and 4 were in a separate cluster, with 3 being the master. I
stopped the elasticsearch service on 3 and 4 and then started the services
up again. They correctly found the cluster of nodes 1 and 2 and all is well
again. Why would this happen, and how can I prevent it from happening? On
node three I found some interesting log reports that I have copied to

gist.github.com

https://gist.github.com/theikell/9948b1d318cdc4cd0ecf

gistfile1.txt

[2014-09-16 16:47:40,015][WARN ][discovery                ] [ElasticSearch03] waited for 30s and no initial state was set by the discovery
[2014-09-16 16:47:40,158][INFO ][http                     ] [ElasticSearch03] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.0.0.7:9200]}
[2014-09-16 16:47:40,158][INFO ][node                     ] [ElasticSearch03] started
[2014-09-16 16:47:43,710][DEBUG][action.admin.indices.template.get] [ElasticSearch03] no known master node, scheduling a retry
[2014-09-16 16:47:51,741][DEBUG][action.admin.cluster.health] [ElasticSearch03] no known master node, scheduling a retry
[2014-09-16 16:48:09,316][INFO ][cluster.service          ] [ElasticSearch03] detected_master [ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true}, added {[ElasticSearch01][z-QBIifMRf2RL-ohoe7zUw][ElasticSearch01][inet[/10.0.0.13:9300]]{master=true},[ElasticSearch02][K96ELjPlQium6bpEKSkxWw][ElasticSearch02][inet[/10.0.0.4:9300]]{master=true},[ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true},}, reason: zen-disco-receive(from master [[ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true}])
[2014-09-16 17:08:27,714][INFO ][cluster.service          ] [ElasticSearch03] removed {[ElasticSearch02][K96ELjPlQium6bpEKSkxWw][ElasticSearch02][inet[/10.0.0.4:9300]]{master=true},}, reason: zen-disco-receive(from master [[ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true}])
[2014-09-16 17:09:06,113][INFO ][cluster.service          ] [ElasticSearch03] added {[ElasticSearch02][QhsUGOfzTM2T-JCVdNgyYg][ElasticSearch02][inet[/10.0.0.4:9300]]{master=true},}, reason: zen-disco-receive(from master [[ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true}])
[2014-09-16 17:21:35,791][INFO ][discovery.azure          ] [ElasticSearch03] master_left [[ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true}], reason [shut_down]
[2014-09-16 17:21:57,008][INFO ][cluster.service          ] [ElasticSearch03] master {new [ElasticSearch03][6LLQiSe9S0-B8InmPTrN-w][ElasticSearch03][inet[ElasticSearch03.sensoria-fitness-es.d5.internal.cloudapp.net/10.0.0.7:9300]]{master=true}, previous [ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true}}, removed {[ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true},}, reason: zen-disco-master_failed ([ElasticSearch04][J-cmvstMTZmB5AkImd2Xsw][ElasticSearch04][inet[/10.0.0.14:9300]]{master=true})

This file has been truncated. show original

Thanks.

Tim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · September 16, 2014, 6:21pm

It looks like you did not configure minimum_master_nodes

Jörg

On Tue, Sep 16, 2014 at 8:00 PM, Tim Heikell tim.heikell@heapsylon.com
wrote:

We are prepping to launch our app into production and seem to be having
some stability issues. We have a cluster of 4 VMs on Azure that all use the
Azure plugin for discovery. Most of the time it works as expected, but
sometimes it looses its mind. This morning for example, I made adjustments
to the memory allocated to the JVM of all nodes. I rebooted all of the
nodes, one at a time, waiting for a green status before rebooting the next
node. When I rebooted the fourth node, the cluster status turned red (as
per node #1). Node 1 only reported that nodes 1 and 2 were in the cluster.
I waited and nothing changed. I eventually checked the node status on node
3 and found that nodes 3 and 4 had formed their own cluster. I ended up in
a state where nodes 1 and 2 were in a cluster, with 2 being the master,
while 3 and 4 were in a separate cluster, with 3 being the master. I
stopped the elasticsearch service on 3 and 4 and then started the services
up again. They correctly found the cluster of nodes 1 and 2 and all is well
again. Why would this happen, and how can I prevent it from happening? On
node three I found some interesting log reports that I have copied to
gist:9948b1d318cdc4cd0ecf · GitHub

Thanks.

Tim

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGz8b-W%3DpUk1P1Emuszu%3DrnX5%2BwHca7k9he2B59mogoJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Tim_Heikell · September 16, 2014, 6:30pm

Thanks for the reply Jörg. I have discovery.zen.minimum_master_nodes=2.
Should it be something different?

On Tuesday, September 16, 2014 11:21:16 AM UTC-7, Jörg Prante wrote:

It looks like you did not configure minimum_master_nodes

Jörg

On Tue, Sep 16, 2014 at 8:00 PM, Tim Heikell <tim.h...@heapsylon.com
<javascript:>> wrote:

We are prepping to launch our app into production and seem to be having
some stability issues. We have a cluster of 4 VMs on Azure that all use the
Azure plugin for discovery. Most of the time it works as expected, but
sometimes it looses its mind. This morning for example, I made adjustments
to the memory allocated to the JVM of all nodes. I rebooted all of the
nodes, one at a time, waiting for a green status before rebooting the next
node. When I rebooted the fourth node, the cluster status turned red (as
per node #1). Node 1 only reported that nodes 1 and 2 were in the cluster.
I waited and nothing changed. I eventually checked the node status on node
3 and found that nodes 3 and 4 had formed their own cluster. I ended up in
a state where nodes 1 and 2 were in a cluster, with 2 being the master,
while 3 and 4 were in a separate cluster, with 3 being the master. I
stopped the elasticsearch service on 3 and 4 and then started the services
up again. They correctly found the cluster of nodes 1 and 2 and all is well
again. Why would this happen, and how can I prevent it from happening? On
node three I found some interesting log reports that I have copied to
gist:9948b1d318cdc4cd0ecf · GitHub

Thanks.

Tim

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53bc6e0c-3110-4f64-90f9-ff0ac84c5ad0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim_Heikell · September 16, 2014, 6:34pm

Ah, I just found the n/2+1 recommendation, so I expect I need to set it to
3.

On Tuesday, September 16, 2014 11:30:38 AM UTC-7, Tim Heikell wrote:

Thanks for the reply Jörg. I have discovery.zen.minimum_master_nodes=2.
Should it be something different?

On Tuesday, September 16, 2014 11:21:16 AM UTC-7, Jörg Prante wrote:

It looks like you did not configure minimum_master_nodes

Jörg

On Tue, Sep 16, 2014 at 8:00 PM, Tim Heikell tim.h...@heapsylon.com
wrote:

We are prepping to launch our app into production and seem to be having
some stability issues. We have a cluster of 4 VMs on Azure that all use the
Azure plugin for discovery. Most of the time it works as expected, but
sometimes it looses its mind. This morning for example, I made adjustments
to the memory allocated to the JVM of all nodes. I rebooted all of the
nodes, one at a time, waiting for a green status before rebooting the next
node. When I rebooted the fourth node, the cluster status turned red (as
per node #1). Node 1 only reported that nodes 1 and 2 were in the cluster.
I waited and nothing changed. I eventually checked the node status on node
3 and found that nodes 3 and 4 had formed their own cluster. I ended up in
a state where nodes 1 and 2 were in a cluster, with 2 being the master,
while 3 and 4 were in a separate cluster, with 3 being the master. I
stopped the elasticsearch service on 3 and 4 and then started the services
up again. They correctly found the cluster of nodes 1 and 2 and all is well
again. Why would this happen, and how can I prevent it from happening? On
node three I found some interesting log reports that I have copied to
gist:9948b1d318cdc4cd0ecf · GitHub

Thanks.

Tim

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d2c6462d-8789-4b9f-9776-ea368f7f5661%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c75e83d-008a-4b05-a62b-23e5e54632d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Azure Cloud Plugin Problems Elasticsearch	5	459	July 6, 2017
Having issues with Azure Cloud plugin Elasticsearch	1	529	July 6, 2017
Unexpected cluster state Elasticsearch	5	502	July 6, 2017
Elasticsearch with azure cloud plugin Elasticsearch	4	629	July 6, 2017
Split brain due to 'on the fence' network partition Elasticsearch	5	766	July 6, 2017

Cluster troubles, Azure related?

Related topics