I have been running Elasticsearch for years and I have never encountered a
collapse such as the one I am experiencing. Even when experiencing split
brain clusters, I still had it running and accepting search requests.
8 node development cluster running 0.90.2 using multicast. Last time the
cluster was full restarted was probably when it was upgraded to 0.90.2
(July 2013?). All are master and data enabled.
I decided to upgrade to Java 7u25 from 7u04. Clients were upgraded first
with no issues. Restarted 2 nodes on the cluster, once again, no issues.
Attempting to restart the next two wrecked havoc on the cluster. Only 5
nodes were able form a cluster. The other 3 nodes were not able to join.
Disabled gateways options, removed some plugins. Nothing. The same 3 would
not join.
After resigning to the facet that I must have bad state files, I decided to
remove the data directory and restart from scratch. The nodes still output
the same message over and over again:
[2014-03-18 21:41:18,333][DEBUG][monitor.network ] [search5]
net_info
host [srch-dv105]
eth0 display_name [eth0]
address [/fe80:0:0:0:250:56ff:feba:9b%2] [/192.168.50.105]
mtu [1500] multicast [true] ptp [false] loopback [false] up [true] virtual
[false]
lo display_name [lo]
address [/0:0:0:0:0:0:0:1%1] [/127.0.0.1]
mtu [16436] multicast [false] ptp [false] loopback [true] up [true] virtual
[false]
...
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: initialized
[2014-03-18 21:24:19,414][INFO ][node ] [search5]
{0.90.2}[30297]: starting ...
[2014-03-18 21:24:19,459][DEBUG][netty.channel.socket.nio.SelectorUtil]
Using select timeout of 500
[2014-03-18 21:24:19,461][DEBUG][netty.channel.socket.nio.SelectorUtil]
Epoll-bug workaround enabled = false
[2014-03-18 21:24:19,953][DEBUG][transport.netty ] [search5] Bound
to address [/0:0:0:0:0:0:0:0:9300]
[2014-03-18 21:24:19,966][INFO ][transport ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
192.168.50.105:9300]}
[2014-03-18 21:24:25,124][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:30,172][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:35,176][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:40,276][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:45,280][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:50,017][WARN ][discovery ] [search5]
waited for 30s and no initial state was set by the discovery
[2014-03-18 21:24:50,019][INFO ][discovery ] [search5]
development/0iuC15VyQ32GdRbZ3kzLLQ
[2014-03-18 21:24:50,020][DEBUG][gateway ] [search5] can't
wait on start for (possibly) reading state from gateway, will do it
asynchronously
[2014-03-18 21:24:50,063][INFO ][http ] [search5]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
192.168.50.105:9200]}
[2014-03-18 21:24:50,064][INFO ][node ] [search5]
{0.90.2}[30297]: started
[2014-03-18 21:24:50,283][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:24:55,287][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:00,290][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:05,294][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:10,297][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:15,301][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:20,305][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:25,309][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-03-18 21:25:30,312][DEBUG][discovery.zen ] [search5]
filtered ping responses: (filter_client[true], filter_data[false]) {none}
...
Attempting to bring up other nodes results in the same type of error
repeatedly, but slightly different:
[2014-03-18 21:46:31,200][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
[2014-03-18 21:46:36,208][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
[2014-03-18 21:46:41,210][DEBUG][discovery.zen ] [search8]
filtered ping responses: (filter_client[true], filter_data[false])
--> target [[search6][T3tINFmqREK9W6oqZV0r7A][inet[/192.168.50.106:9300]]],
master [null]
--> target [[search12][DSFGqzbVR1uZYT-63sBAcg][inet[/192.168.50.112:9300]]],
master [null]
Note that search5 (above) does not see these other nodes (search6 and
search12).
Even with a new data directory, a node will still not start without the
constant logging. The only change was upgrading from Java 7u04 to 7u25. Two
nodes were restarted without issues. Unfortunately, I did not bother to
look at which node was currently the master.
elasticsearch.yml https://gist.github.com/brusic/98bdf84b07cadfc2afdf
Luckily this cluster is only development, but my plan was to upgrade our
production environments as well.
Cheers,
Ivan
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAyWbj5WvLxbKtU3_Jh1BkCYHHyNz%3Dhmhd%3DF%2B-jGb6t-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.