What I'd like to do is to be able to rebuild dead nodes from scratch using
Puppet, take advantage of the flexibility of the cloud, and the fact that
we put all our ES data on a separate detachable volumes ("/dev/sdb").
For this, I'd like to be able to do something like:
Stop replication (to avoid unnecessary shard reshuffling, as
suggested in the link above)
Stop an ES node software.
Stop the EC2 instance (not terminate it), detach the data volume from
it.
Use some CloudFormation, UserData and Puppet magic to bring up a new
EC2 node and attach the data volume to it
Configure the new data node using Puppet to install ES 1.5.0 on it
Start ES, make it join the cluster and make the cluster aware that it
still got the data from the volume of the old node.
But my colleague tells me that this wont work because, as far as he can
tell, the shard is identified also by the the EC2 Instance ID (the
"i-xxxxxxx" identifier), and therefore the new instance will not be heard
when it tells the cluster that it has the data from the old node.
Is this correct?
What do others here do to address EC2 instance loss? Just replicate even if
the data is still available in the separate volume?
I'm planning a rolling upgrade of our ES cluster from ES 1.3.2 to ES 1.5.0, hosted on AWS.
I read the instructions at Upgrade Elasticsearch | Elasticsearch Guide [8.11] | Elastic which seem to be pretty straight forward but it doesn't address issues like using an automatic provisioning system like Puppet/Chef, and generally rebuilding nodes after a disaster.
What I'd like to do is to be able to rebuild dead nodes from scratch using Puppet, take advantage of the flexibility of the cloud, and the fact that we put all our ES data on a separate detachable volumes ("/dev/sdb").
For this, I'd like to be able to do something like:
Stop replication (to avoid unnecessary shard reshuffling, as suggested in the link above)
Stop an ES node software.
Stop the EC2 instance (not terminate it), detach the data volume from it.
Use some CloudFormation, UserData and Puppet magic to bring up a new EC2 node and attach the data volume to it
Configure the new data node using Puppet to install ES 1.5.0 on it
Start ES, make it join the cluster and make the cluster aware that it still got the data from the volume of the old node.
But my colleague tells me that this wont work because, as far as he can tell, the shard is identified also by the the EC2 Instance ID (the "i-xxxxxxx" identifier), and therefore the new instance will not be heard when it tells the cluster that it has the data from the old node.
Is this correct?
What do others here do to address EC2 instance loss? Just replicate even if the data is still available in the separate volume?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.