I had a cluster of 2 x 2GB Rackspace cloud boxes running ES 0.18.5 for
the last 3 weeks. It's been working great and we've had several
million records inserted and deleted in that time.
Here is the config:
Yesterday I updated the boxes to 0.18.6 (and was trying to get the
boxes to log to syslog as well) - it appears that something didn't
work so well during the upgrade and I was left with 2 boxes both
thinking that they're masters.
Here are the logs from the boxes during the upgrade:
And then from the next day:
I tried to get them to re-connect, but I couldn't get anything to work
correctly - they were both completely separate.
I now have a single box with the correct index:
I have a chef recipe that builds new elasticsearch boxes so I spun up
a new box and tried to get it to join the cluster, but no dice - it's
like none of the other boxes exist. I've also tried to go back down to
0.18.5 - no dice.
Is there a way I can point a new box at that master directly and say:
"Hey you're a slave, there is the master."
I'm a bit of a loss here - and don't want to admit defeat - but I'm a
little lost here.
It's a system that we're building and I CAN lose the data, but I
really want to understand:
- Why this happened.
- How I can recover from this.
- How I can prevent this from happening in the future.
Thanks in advance if anybody can point to something I will greatly
appreciate it.