Master Node Failover ..?


(Roy Russo) #1

Hello all,

I'm having trouble getting my master node in the cluster to failover for
some reason. It does appear that it does failover, the other nodes detect
it and promote another node to master, but then the cluster does not
respond to any curl calls and the new master goes away.

Relevant log lines... where Node4 is the master and is being shut down and
it looks like Node1 is promoted to new master... then all hell breaks loose.

[2013-09-12 15:36:03,659][INFO ][node ] [Node4] {0.90.1
}[8760]: stopping ...
[2013-09-12 15:36:03,688][INFO ][discovery.zen ] [Node1]master_left
[[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}],reason
[shut_down]
[2013-09-12 15:36:03,689][INFO ][discovery.zen ] [Node5]master_left
[[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}],reason
[shut_down]
[2013-09-12 15:36:03,692][INFO ][cluster.service ] [Node1] master {
new [Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/172.19.88.182:9304]]{master=true},previous
[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}},removed
{[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true},},reason
: zen-disco-master_failed ([Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.
88.182:9305]]{master=true})
[2013-09-12 15:36:03,749][INFO ][cluster.service ] [Node2] master {
new [Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/172.19.88.182:9304]]{master=true},previous
[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}},removed
{[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true},},reason
: zen-disco-receive(from master [[Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/
172.19.88.182:9304]]{master=true}])
[2013-09-12 15:36:03,835][INFO ][cluster.service ] [Node5] master {
new [Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/172.19.88.182:9304]]{master=true},previous
[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}},removed
{[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true},},reason
: zen-disco-master_failed ([Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.
88.182:9305]]{master=true})
[2013-09-12 15:36:03,691][INFO ][discovery.zen ] [Manga]master_left
[[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}],reason
[shut_down]
[2013-09-12 15:36:03,743][INFO ][cluster.service ] [Manga] master {
new [Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/172.19.88.182:9304]]{master=true},previous
[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}},removed
{[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true},},reason
: zen-disco-receive(from master [[Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/
172.19.88.182:9304]]{master=true}])
[2013-09-12 15:36:03,854][INFO ][node ] [Node4] {0.90.1
}[8760]: stopped
[2013-09-12 15:36:03,856][INFO ][node ] [Node4] {0.90.1
}[8760]: closing ...
[2013-09-12 15:36:03,874][INFO ][node ] [Node4] {0.90.1
}[8760]: closed
[2013-09-12 15:36:04,849][INFO ][discovery.zen ] [Node3]master_left
[[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}],reason
[transport disconnected (with verified connect)]
[2013-09-12 15:36:04,851][INFO ][cluster.service ] [Node3] master {
new [Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/172.19.88.182:9304]]{master=true},previous
[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true}},removed
{[Node4][58mmQXeRSd6dnUzvWEXdrg][inet[/172.19.88.182:9305]]{master=true},},reason
: zen-disco-receive(from master [[Node1][98AakXS2SgaLeFbOMYD9DQ][inet[/
172.19.88.182:9304]]{master=true}])

The startup script I often use is:
start cmd.exe /C elasticsearch
-Des.config=C:\elasticsearch\elasticsearch-0.90.1\config\elasticsearch.yml

start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Primus
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Manga

start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Node1
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Node2
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Node3
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Node4
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.name=Node5

I tried changing the stock config, but to no avail:
discovery.zen.minimum_master_nodes: 3

discovery.zen.ping.timeout: 5s

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Roy Russo) #2

One thing I have noticed is that killing another node and re-adding it,
seems to wake up the entire cluster to handle rest calls again. From the
logs, it looks like all the nodes are running under the new master, but
somehow unreachable until a node is removed-and-re-added. Thoughts?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3