Complete cluster failure

Ivan · March 19, 2014, 5:03am

I have been running Elasticsearch for years and I have never encountered a
collapse such as the one I am experiencing. Even when experiencing split
brain clusters, I still had it running and accepting search requests.

8 node development cluster running 0.90.2 using multicast. Last time the
cluster was full restarted was probably when it was upgraded to 0.90.2
(July 2013?). All are master and data enabled.

I decided to upgrade to Java 7u25 from 7u04. Clients were upgraded first
with no issues. Restarted 2 nodes on the cluster, once again, no issues.
Attempting to restart the next two wrecked havoc on the cluster. Only 5
nodes were able form a cluster. The other 3 nodes were not able to join.
Disabled gateways options, removed some plugins. Nothing. The same 3 would
not join.

After resigning to the facet that I must have bad state files, I decided to
remove the data directory and restart from scratch. The nodes still output
the same message over and over again:

[2014-03-18 21:41:18,333][DEBUG][monitor.network ] [search5]
net_info
host [srch-dv105]
eth0 display_name [eth0]
address [/fe80:0:0:0:250:56ff:feba:9b%2] [/192.168.50.105]
mtu [1500] multicast [true] ptp [false] loopback [false] up [true] virtual
[false]
lo display_name [lo]
address [/0:0:0:0:0:0:0:1%1] [/127.0.0.1]
mtu [16436] multicast [false] ptp [false] loopback [true] up [true] virtual
[false]
...
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: initialized
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: starting ...
[2014-03-18 21:24:19,459][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-03-18 21:24:19,461][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-03-18 21:24:19,953][DEBUG][transport.netty ] [search5] Bound
to address [/0:0:0:0:0:0:0:0:9300]
[2014-03-18 21:24:19,966][INFO ][transport ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
192.168.50.105:9300]}
[2014-03-18 21:24:25,124][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:30,172][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:35,176][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:40,276][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:45,280][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:50,017][WARN ][discovery ] [search5]
waited for 30s and no initial state was set by the discovery
[2014-03-18 21:24:50,019][INFO ][discovery ] [search5]
development/0iuC15VyQ32GdRbZ3kzLLQ
[2014-03-18 21:24:50,020][DEBUG][gateway ] [search5] can't
wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-03-18 21:24:50,063][INFO ][http ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
192.168.50.105:9200]}
[2014-03-18 21:24:50,064][INFO ][node ] [search5]
{0.90.2}[30297]: started
[2014-03-18 21:24:50,283][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:55,287][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:00,290][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:05,294][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:10,297][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:15,301][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:20,305][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:25,309][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:30,312][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
...

Attempting to bring up other nodes results in the same type of error
repeatedly, but slightly different:
[2014-03-18 21:46:31,200][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
[2014-03-18 21:46:36,208][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
[2014-03-18 21:46:41,210][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]

Note that search5 (above) does not see these other nodes (search6 and
search12).

Even with a new data directory, a node will still not start without the
constant logging. The only change was upgrading from Java 7u04 to 7u25. Two
nodes were restarted without issues. Unfortunately, I did not bother to
look at which node was currently the master.

elasticsearch.yml https://gist.github.com/brusic/98bdf84b07cadfc2afdf

Luckily this cluster is only development, but my plan was to upgrade our
production environments as well.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAyWbj5WvLxbKtU3_Jh1BkCYHHyNz%3Dhmhd%3DF%2B-jGb6t-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · March 19, 2014, 5:54am

No matter in what order I restart the servers, the same 4 node clusters get
created. I suspect network, especially since there was some work done this
past Friday on the underlying VM host. Would Elasticsearch cache multicast
information? The servers have not been restarted in at least a week.

Ivan

On Tue, Mar 18, 2014 at 10:03 PM, Ivan Brusic ivan@brusic.com wrote:

I have been running Elasticsearch for years and I have never encountered a
collapse such as the one I am experiencing. Even when experiencing split
brain clusters, I still had it running and accepting search requests.

8 node development cluster running 0.90.2 using multicast. Last time the
cluster was full restarted was probably when it was upgraded to 0.90.2
(July 2013?). All are master and data enabled.

I decided to upgrade to Java 7u25 from 7u04. Clients were upgraded first
with no issues. Restarted 2 nodes on the cluster, once again, no issues.
Attempting to restart the next two wrecked havoc on the cluster. Only 5
nodes were able form a cluster. The other 3 nodes were not able to join.
Disabled gateways options, removed some plugins. Nothing. The same 3 would
not join.

After resigning to the facet that I must have bad state files, I decided
to remove the data directory and restart from scratch. The nodes still
output the same message over and over again:

[2014-03-18 21:41:18,333][DEBUG][monitor.network ] [search5]
net_info
host [srch-dv105]
eth0 display_name [eth0]
address [/fe80:0:0:0:250:56ff:feba:9b%2] [/192.168.50.105]
mtu [1500] multicast [true] ptp [false] loopback [false] up [true]
virtual [false]
lo display_name [lo]
address [/0:0:0:0:0:0:0:1%1] [/127.0.0.1]
mtu [16436] multicast [false] ptp [false] loopback [true] up [true]
virtual [false]
...
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: initialized
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: starting ...
[2014-03-18 21:24:19,459][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-03-18 21:24:19,461][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-03-18 21:24:19,953][DEBUG][transport.netty ] [search5]
Bound to address [/0:0:0:0:0:0:0:0:9300]
[2014-03-18 21:24:19,966][INFO ][transport ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
192.168.50.105:9300]}
[2014-03-18 21:24:25,124][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:30,172][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:35,176][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:40,276][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:45,280][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:50,017][WARN ][discovery ] [search5]
waited for 30s and no initial state was set by the discovery
[2014-03-18 21:24:50,019][INFO ][discovery ] [search5]
development/0iuC15VyQ32GdRbZ3kzLLQ
[2014-03-18 21:24:50,020][DEBUG][gateway ] [search5]
can't wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-03-18 21:24:50,063][INFO ][http ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
192.168.50.105:9200]}
[2014-03-18 21:24:50,064][INFO ][node ] [search5]
{0.90.2}[30297]: started
[2014-03-18 21:24:50,283][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:55,287][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:00,290][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:05,294][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:10,297][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:15,301][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:20,305][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:25,309][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:30,312][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
...

Attempting to bring up other nodes results in the same type of error
repeatedly, but slightly different:
[2014-03-18 21:46:31,200][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
[2014-03-18 21:46:36,208][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
[2014-03-18 21:46:41,210][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]

Note that search5 (above) does not see these other nodes (search6 and
search12).

Even with a new data directory, a node will still not start without the
constant logging. The only change was upgrading from Java 7u04 to 7u25. Two
nodes were restarted without issues. Unfortunately, I did not bother to
look at which node was currently the master.

elasticsearch.yml elasticsearch.yml · GitHub

Luckily this cluster is only development, but my plan was to upgrade our
production environments as well.

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA9btY4xX8srE7FzgxSYa89cfSWtx1xfe9aLw1fxBBSUA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jaguar · March 19, 2014, 6:11am

How many NIC are there on each of your nodes? We got some issue on boxes
with 4 NIC, some address were not reachable due to linux kernel setting.
I'd suggest you test the full connection matrix via some shell script, so
as to rule out this cause.
My 2 cents

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · March 19, 2014, 6:47am

My mind was not clear since I was debugging this issue for a few hours.
Once I realized it was a multicast issue, I switched to unicast and the
cluster was back up and running. So it was multicast after all. I should
have been more careful when I received an email on Friday that said
" will have to wait till early next week due to errors on the
host." Errors on the host? I should have made them explain themselves.

I do not have control over the sysadmin aspects of the system. If I did, we
would be running the latest stable release of Java, Elasticsearch, ..

Thanks,

Ivan

On Tue, Mar 18, 2014 at 11:11 PM, 熊贻青 xiong.jaguar@gmail.com wrote:

How many NIC are there on each of your nodes? We got some issue on boxes
with 4 NIC, some address were not reachable due to linux kernel setting.
I'd suggest you test the full connection matrix via some shell script, so
as to rule out this cause.
My 2 cents

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQArVT-Vze2rqDmwcGQHsc69v2D0bcMiP51JeCcYW1a9TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

polyfractal · March 20, 2014, 2:25am

Yeah, in case anyone reads this thread in the future, this log output is a
good indicator of multicast problems. You can see that the the nodes are
pinging and talking to each other on this log line:

--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]

That's basically a ping response from a node saying "Hey! I'm alive, but I
dunno who the master is yet". If the nodes were unable to communicate,
you'd see failed ping responsese instead. But they are unable to fully
connect and start to establish/elect a master. So they all just sit around
and ping each other for ages until giving up. It isn't always, but
usually, a multicast issue when you see logs like this. You can usually
diagnose this by manually telnetting between nodes on port 9300...if the
connection isn't refused, its probably a multicast discovery issue.

The other common culprit is IPv6 issue, often when you have multiple NICs.
You'll see the publish address bind to IPv6 and the bind address on
IPv4...and the whole cluster goes to hell because the nodes can see each
other but not communicate.

Discovery issues are generally fixed by disabling multicast and forcing
IPv4 in my experience.

Glad you got it working again Ivan!

On Wednesday, March 19, 2014 1:47:11 AM UTC-5, Ivan Brusic wrote:

My mind was not clear since I was debugging this issue for a few hours.
Once I realized it was a multicast issue, I switched to unicast and the
cluster was back up and running. So it was multicast after all. I should
have been more careful when I received an email on Friday that said
" will have to wait till early next week due to errors on the
host." Errors on the host? I should have made them explain themselves.

I do not have control over the sysadmin aspects of the system. If I did,
we would be running the latest stable release of Java, Elasticsearch, ..

Thanks,

Ivan

On Tue, Mar 18, 2014 at 11:11 PM, 熊贻青 <xiong....@gmail.com <javascript:>>wrote:

How many NIC are there on each of your nodes? We got some issue on boxes
with 4 NIC, some address were not reachable due to linux kernel setting.
I'd suggest you test the full connection matrix via some shell script, so
as to rule out this cause.
My 2 cents

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAP0hgQ2Ks4Sg%2B0xoiKsX-jmb3oa7L%3D94BD0E9u5sk_uT7f0ohw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ec416d7a-2a3d-4100-b71e-195bb8d1f04b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · March 20, 2014, 6:23am

Responses inline.

On Wed, Mar 19, 2014 at 7:25 PM, Zachary Tong zacharyjtong@gmail.comwrote:

Yeah, in case anyone reads this thread in the future, this log output is a
good indicator of multicast problems. You can see that the the nodes are
pinging and talking to each other on this log line:

--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]

That's basically a ping response from a node saying "Hey! I'm alive, but
I dunno who the master is yet". If the nodes were unable to communicate,
you'd see failed ping responsese instead. But they are unable to fully
connect and start to establish/elect a master. So they all just sit around
and ping each other for ages until giving up. It isn't always, but
usually, a multicast issue when you see logs like this. You can usually
diagnose this by manually telnetting between nodes on port 9300...if the
connection isn't refused, its probably a multicast discovery issue.

Once everything was stabilized and I was able to get some sleep, I looked
at the code (ZenDiscovery.java) the next day and realized exactly what you
said. Everything has been running on the existing infrastructure since
2012, so I did not initially put blame on multicast discovery.

The other common culprit is IPv6 issue, often when you have multiple NICs.

You'll see the publish address bind to IPv6 and the bind address on
IPv4...and the whole cluster goes to hell because the nodes can see each
other but not communicate.

Once I realized it was a multicast issue, I blamed the sysadmins because it
wasn't my fault. I assumed it was an issue communicating between the
underlying VM hosts, but I was able to replicate the issue on physical
machines as well. The sysadmin suspected IPV6. We are indeed running IPV6
and looking at the logs in the initial post the bound and publish address
are indeed IPV4 and IPV6 respectively. Tried
setting java.net.preferIPv4Stack=true in JAVA_OPTS, but it did not make a
difference. Ultimately I decided that even if we discovered and fix the
culprit, we might get bitten again in the future, so I switched to unicast.
I know have to keep track of different config files in source control
instead of one.

Discovery issues are generally fixed by disabling multicast and forcing
IPv4 in my experience.

How would you force IPV4? I tried using preferIPv4Stack and setting
network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Glad you got it working again Ivan!

You and me both!

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQC2itL6cyg4i-uTAXytSd6s84p_v1bYs%3DGRb7D%2B0SwOUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

polyfractal · March 20, 2014, 1:04pm

Nice post-mortem, thanks for the writeup. Hopefully someone will stumble
on this in the future and avoid the same headache you had

How would you force IPV4? I tried using preferIPv4Stack and setting

network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Hmm...that's interesting. I would have recommended those two exact
methods. I'll do some digging and see why they didn't work...

-Z

On Thursday, March 20, 2014 1:23:48 AM UTC-5, Ivan Brusic wrote:

Responses inline.

On Wed, Mar 19, 2014 at 7:25 PM, Zachary Tong <zachar...@gmail.com<javascript:>

wrote:

Yeah, in case anyone reads this thread in the future, this log output is
a good indicator of multicast problems. You can see that the the nodes are
pinging and talking to each other on this log line:

--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]

That's basically a ping response from a node saying "Hey! I'm alive, but
I dunno who the master is yet". If the nodes were unable to communicate,
you'd see failed ping responsese instead. But they are unable to fully
connect and start to establish/elect a master. So they all just sit around
and ping each other for ages until giving up. It isn't always, but
usually, a multicast issue when you see logs like this. You can usually
diagnose this by manually telnetting between nodes on port 9300...if the
connection isn't refused, its probably a multicast discovery issue.

Once everything was stabilized and I was able to get some sleep, I looked
at the code (ZenDiscovery.java) the next day and realized exactly what you
said. Everything has been running on the existing infrastructure since
2012, so I did not initially put blame on multicast discovery.

The other common culprit is IPv6 issue, often when you have multiple NICs.

You'll see the publish address bind to IPv6 and the bind address on
IPv4...and the whole cluster goes to hell because the nodes can see each
other but not communicate.

Once I realized it was a multicast issue, I blamed the sysadmins because
it wasn't my fault. I assumed it was an issue communicating between the
underlying VM hosts, but I was able to replicate the issue on physical
machines as well. The sysadmin suspected IPV6. We are indeed running IPV6
and looking at the logs in the initial post the bound and publish address
are indeed IPV4 and IPV6 respectively. Tried
setting java.net.preferIPv4Stack=true in JAVA_OPTS, but it did not make a
difference. Ultimately I decided that even if we discovered and fix the
culprit, we might get bitten again in the future, so I switched to unicast.
I know have to keep track of different config files in source control
instead of one.

Discovery issues are generally fixed by disabling multicast and forcing
IPv4 in my experience.

How would you force IPV4? I tried using preferIPv4Stack and setting
network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Glad you got it working again Ivan!

You and me both!

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5750058e-5305-4408-93c2-ee9ddf3ea300%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · March 20, 2014, 4:58pm

Don't bother trying digging deeper since I suspect network.

I tried many different configurations while trying to pinpoint the problem,
so I did not write down the various states, just the successes/failures.
Using the described methods, IPV4 was indeed working, but multicast was
still not cooperating on the test cluster (bound_address
{inet[/192.168.50.124:9300]}, publish_address {inet[/192.168.50.124:9300]}).
Two out of the eight nodes refused to see the other six, but were able to
talk to each other. Same subnet with iptables disabled.

I am transitioning the unicast, just writing things down in case someone
has similar problems in the future.

--
Ivan

On Thu, Mar 20, 2014 at 6:04 AM, Zachary Tong zacharyjtong@gmail.comwrote:

Nice post-mortem, thanks for the writeup. Hopefully someone will stumble
on this in the future and avoid the same headache you had

How would you force IPV4? I tried using preferIPv4Stack and setting

network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Hmm...that's interesting. I would have recommended those two exact
methods. I'll do some digging and see why they didn't work...

-Z

On Thursday, March 20, 2014 1:23:48 AM UTC-5, Ivan Brusic wrote:

Responses inline.

On Wed, Mar 19, 2014 at 7:25 PM, Zachary Tong zachar...@gmail.comwrote:

Yeah, in case anyone reads this thread in the future, this log output is
a good indicator of multicast problems. You can see that the the nodes are
pinging and talking to each other on this log line:

--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]

That's basically a ping response from a node saying "Hey! I'm alive,
but I dunno who the master is yet". If the nodes were unable to
communicate, you'd see failed ping responsese instead. But they are unable
to fully connect and start to establish/elect a master. So they all just
sit around and ping each other for ages until giving up. It isn't always,
but usually, a multicast issue when you see logs like this. You can
usually diagnose this by manually telnetting between nodes on port
9300...if the connection isn't refused, its probably a multicast discovery
issue.

Once everything was stabilized and I was able to get some sleep, I looked
at the code (ZenDiscovery.java) the next day and realized exactly what you
said. Everything has been running on the existing infrastructure since
2012, so I did not initially put blame on multicast discovery.

The other common culprit is IPv6 issue, often when you have multiple

NICs. You'll see the publish address bind to IPv6 and the bind address on
IPv4...and the whole cluster goes to hell because the nodes can see each
other but not communicate.

Once I realized it was a multicast issue, I blamed the sysadmins because
it wasn't my fault. I assumed it was an issue communicating between the
underlying VM hosts, but I was able to replicate the issue on physical
machines as well. The sysadmin suspected IPV6. We are indeed running IPV6
and looking at the logs in the initial post the bound and publish address
are indeed IPV4 and IPV6 respectively. Tried setting java.net.preferIPv4Stack=true
in JAVA_OPTS, but it did not make a difference. Ultimately I decided that
even if we discovered and fix the culprit, we might get bitten again in the
future, so I switched to unicast. I know have to keep track of different
config files in source control instead of one.

Discovery issues are generally fixed by disabling multicast and forcing
IPv4 in my experience.

How would you force IPV4? I tried using preferIPv4Stack and setting
network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Glad you got it working again Ivan!

You and me both!

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5750058e-5305-4408-93c2-ee9ddf3ea300%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5750058e-5305-4408-93c2-ee9ddf3ea300%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDpQTgT29WyELH%2Bcf_%3DwgZpV0BJM-VgdJeTirvJ5vTCJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ivan · March 24, 2014, 4:37pm

Just another update since there have been others that had issues with
multicast in the past and switched to unicast.

My issue appears to be with the multicast group. The default in
Elasticsearch is 224.2.2.4, which according to the RFC is within the SDP/SAP
Block. Our internal application uses mDNS without issues, which is within
the Local Network Control Block. After switching the multicast group to an
unassigned IP within the Local Network Control Block or
the Administratively Scoped Block, Elasticsearch works fine with multicast.

Of course, it still does not explain why multicast failed despite the
cluster being operational for a long time. I do not have insight into the
network beyond what ifconfig tells me. My guess is that the cluster had a
shared state of valid nodes which was lost when I restarted three nodes at
a time instead of the routine single server restart.

I am still switching to unicast, but also keeping multicast enabled with a
different multicast group. Hopefully others will gain some knowledge for
these replies.

--
Ivan

On Thu, Mar 20, 2014 at 9:58 AM, Ivan Brusic ivan@brusic.com wrote:

Don't bother trying digging deeper since I suspect network.

I tried many different configurations while trying to pinpoint the
problem, so I did not write down the various states, just the
successes/failures. Using the described methods, IPV4 was indeed working,
but multicast was still not cooperating on the test cluster (bound_address
{inet[/192.168.50.124:9300]}, publish_address {inet[/192.168.50.124:9300]}).
Two out of the eight nodes refused to see the other six, but were able to
talk to each other. Same subnet with iptables disabled.

I am transitioning the unicast, just writing things down in case someone
has similar problems in the future.

--
Ivan

On Thu, Mar 20, 2014 at 6:04 AM, Zachary Tong zacharyjtong@gmail.comwrote:

Nice post-mortem, thanks for the writeup. Hopefully someone will stumble
on this in the future and avoid the same headache you had

How would you force IPV4? I tried using preferIPv4Stack and setting

network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Hmm...that's interesting. I would have recommended those two exact
methods. I'll do some digging and see why they didn't work...

-Z

On Thursday, March 20, 2014 1:23:48 AM UTC-5, Ivan Brusic wrote:

Responses inline.

On Wed, Mar 19, 2014 at 7:25 PM, Zachary Tong zachar...@gmail.comwrote:

Yeah, in case anyone reads this thread in the future, this log output
is a good indicator of multicast problems. You can see that the the nodes
are pinging and talking to each other on this log line:

--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]

That's basically a ping response from a node saying "Hey! I'm alive,
but I dunno who the master is yet". If the nodes were unable to
communicate, you'd see failed ping responsese instead. But they are unable
to fully connect and start to establish/elect a master. So they all just
sit around and ping each other for ages until giving up. It isn't always,
but usually, a multicast issue when you see logs like this. You can
usually diagnose this by manually telnetting between nodes on port
9300...if the connection isn't refused, its probably a multicast discovery
issue.

Once everything was stabilized and I was able to get some sleep, I
looked at the code (ZenDiscovery.java) the next day and realized exactly
what you said. Everything has been running on the existing infrastructure
since 2012, so I did not initially put blame on multicast discovery.

The other common culprit is IPv6 issue, often when you have multiple

NICs. You'll see the publish address bind to IPv6 and the bind address on
IPv4...and the whole cluster goes to hell because the nodes can see each
other but not communicate.

Once I realized it was a multicast issue, I blamed the sysadmins because
it wasn't my fault. I assumed it was an issue communicating between the
underlying VM hosts, but I was able to replicate the issue on physical
machines as well. The sysadmin suspected IPV6. We are indeed running IPV6
and looking at the logs in the initial post the bound and publish address
are indeed IPV4 and IPV6 respectively. Tried setting java.net.preferIPv4Stack=true
in JAVA_OPTS, but it did not make a difference. Ultimately I decided that
even if we discovered and fix the culprit, we might get bitten again in the
future, so I switched to unicast. I know have to keep track of different
config files in source control instead of one.

Discovery issues are generally fixed by disabling multicast and forcing
IPv4 in my experience.

How would you force IPV4? I tried using preferIPv4Stack and setting
network.host to eth0:ipv4, but it still did not work. Even switched off
iptables at a point!

Glad you got it working again Ivan!

You and me both!

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5750058e-5305-4408-93c2-ee9ddf3ea300%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/5750058e-5305-4408-93c2-ee9ddf3ea300%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCfyL3tAHap4QeFtdmBj%3DrEbTMj34wOM34UuW3RE_%3Dm_g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Nodes fail to join cluster - potential split brain scenario Elasticsearch	11	543	July 6, 2017
0.90.7 - split brain right out of the box Elasticsearch	1	322	July 6, 2017
Elasticsearch cluster instability Elasticsearch	13	2828	July 6, 2017
Cluster is down and master nodes are not coming up Elasticsearch	17	2305	June 26, 2019
Cluster constantly crashing after upgrade to 7.4 Elasticsearch	5	1811	November 19, 2019

Complete cluster failure

Related topics