Due to some stuff that happened, I have an Elasticsearch node with a lot of data, that isn't up to date with the cluster.
What I need to do is somehow start this node as a separate cluster, without it needing the voting results of two other nodes, and without it joining the existing cluster.
One more thing, if it's not possible to transfer it as a single-node. Is it possible to create new voting-only masters and join them to the new cluster so it starts up?
How did you end up in this situation? Why can the node not rejoin the cluster?
The FS got corrupted so we spent a few days recovering it... A lot of stuff happened, but long story short - if he joins the cluster he will delete the indices because they are no longer present on the current cluster (they were deleted).
I suspect you will need to use the elasticsearch-node tool, but note that this comes with warnings and is unsafe. I will not be able to help with this as I have fortunately not had to use it, but maybe someone else can help if you have issues or questions around the docs.
Thanks @Christian_Dahlqvist, so far so good.
The unsafe-bootstrap bootstrap option allowed me to start the node with a different cluster name and the data is safe, and in tact.
Since that node is now it's own cluster. Could you use logstash to move the data between the two clusters? Might not be quick, but you'd have good visibility of it working as the process runs.
We did a lot of testing on a massive cluster we have and Logstash is only good for a small amount of data. Same goes for a remote reindex (since slicing is not supported).
No matter how much we tuned a single logstash couldn't go over 60k/s ingesting and a remote reindex capped at about 30-35k/s.
Another bad thing with Logstash is it has no state when using elasticsearch as input and output. If it restarts or crashes or smth and you have to restart it, it will re-read all the indexes (assuming you give it a wildcard). If going index by index you have to monitor it constantly which is also not ideal.
Best bet in case of smaller data, imo is remote reindex as it can be monitored via task API.
In big clusters, snapshot/restore is by FAR the best bet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.