I upgraded my cluster from 19.8 to 20.2 and was getting spit brain
confusion. here is what I did:
in config/elasticsearch.yml I have the following (this is same before and
after updrade
path.data: /mnt/sda2/data/
path.logs: /mnt/sda2/logs/elasticsearch
node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.1.9", "192.168.1.8",
"129.168.1.7", "192.168.1.6"]
so I stopped all nodes (running 19.8) upgraded to 20.2 then started all
nodes (I assume this is what is referred to as 'a full cluster restart'?)
When I noticed high cpu usage for 30 minutes I investigated using bigdesk
and noticed notes were in separate clusters
I did not find any reference in changelogs to unicast or multicast so find
this behaviour strange. I had to revert back to 19.8 to bring things back
to sanity.
Anyone have ant insight to this behaviour?
Btw the reason im using unicast is the network does not support it.
Ok so unicast and multicast is too scary for anyone to touch, but can
someone please at least clarify what is meant by "a full cluster restart".
Am I correct in assuming that all nodes need to be shut down, upgraded then
started, or can I upgrade the nodes one by one while the others are still
running to minimize downtime?
IMHO a full cluster restart is a global shutdown, an upgrade of all cluster and a restart. So, yes, you will have downtime, but perhaps under the minute if you have prepared everything before...
HTH
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Ok so unicast and multicast is too scary for anyone to touch, but can someone please at least clarify what is meant by "a full cluster restart". Am I correct in assuming that all nodes need to be shut down, upgraded then started, or can I upgrade the nodes one by one while the others are still running to minimize downtime?
Did you manage to resolve this? Full cluster restart is restart all the nodes and start them with the new version. Unicast disco works the same as 0.19.
Ok so unicast and multicast is too scary for anyone to touch, but can someone please at least clarify what is meant by "a full cluster restart". Am I correct in assuming that all nodes need to be shut down, upgraded then started, or can I upgrade the nodes one by one while the others are still running to minimize downtime?
after that I managed to get all nodes in one cluster, at some point one of
the nodes was unresponsive, after I tried to restart it it kept failing to
start, restarting other nodes also failed to start. I panicked and stopped
all nodes swiched back to 19.8 and tried to start, again all nodes failed..
at this point people were asking some very serious questions... well to
cut a long story short I looked into the logs and it complained of special
character not allowed in yml file, I found in one of the comments a french
character "a' la raid" I changed it to a normal "a" and managed to start
all nodes (in 20.2) fine, status was green in a couple of minutes.
I cant understand why the conf file suddenly caused nodes to stop working,
I can only assume when I fixed the ips the encoding was changed (probably
to utf-8) and ES or java didnt like that.
All seems well now and things are running fine. I have been hounding the
network administrators for an answer to if multicast is enabled and why its
not possible but have not managed to get a strait coherent answer.
Regards
GX
On Sunday, January 27, 2013 12:34:02 PM UTC+2, kimchy wrote:
Did you manage to resolve this? Full cluster restart is restart all the
nodes and start them with the new version. Unicast disco works the same as
0.19.
On Jan 25, 2013, at 4:01 AM, GX <mail...@gmail.com <javascript:>> wrote:
Ok so unicast and multicast is too scary for anyone to touch, but can
someone please at least clarify what is meant by "a full cluster restart".
Am I correct in assuming that all nodes need to be shut down, upgraded then
started, or can I upgrade the nodes one by one while the others are still
running to minimize downtime?
well to cut a long story short I looked into the logs and it complained of
special character not allowed in yml file, I found in one of the comments
a french character "a' la raid" I changed it to a normal "a" and managed to
start all nodes (in 20.2) fine, status was green in a couple of minutes.
I cant understand why the conf file suddenly caused nodes to stop working,
I can only assume when I fixed the ips the encoding was changed (probably
to utf-8) and ES or java didnt like that.
No Im not using windows, Im using slackware (both development and
production), but that is the exact error I had. I may have copied the 19.8
config yml file to 20.2 to keep my settings..
Thanks for the input
GX
On Sunday, January 27, 2013 6:36:33 PM UTC+2, Ivan Brusic wrote:
On Sun, Jan 27, 2013 at 2:51 AM, GX <mail...@gmail.com <javascript:>>wrote:
well to cut a long story short I looked into the logs and it complained
of special character not allowed in yml file, I found in one of the
comments a french character "a' la raid" I changed it to a normal "a" and
managed to start all nodes (in 20.2) fine, status was green in a couple of
minutes.
I cant understand why the conf file suddenly caused nodes to stop
working, I can only assume when I fixed the ips the encoding was changed
(probably to utf-8) and ES or java didnt like that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.