hello, I've got 2 clusters one is nice and fast with SSD storage, lean indexes (although many of them) and if I cycle a node in that cluster it will recover from yellow back to green in about 30 mins. The other uses spindles, the data is unrefined it's cluncky the shard to node ratio is way out but it runs, if I were to cycle a node though I would be looking at between 12-24 hours for the system to go green again.
The hardware in the larger capacity cluster is not great but it's being given a refresh. I've got a replacement bank of servers coming and I'm going to be doing a rolling decomission of the nodes. There are plans to also update the hardwareon the faster cluster but the new hardware for those will go into the slower cluster and the data will be reindexed across.
The plan will then be to have the slower nodes as cold nodes and the faster ones as hot nodes. I have done some reading and see that I can set this in the elasticserach.yml file but what I'm wondering is if there will be any issues with having a cluster made up of nothing but cold nodes? Given the time it takes to cycle them I'd rather set them up with th e roll now and till I remove the last node that has no state set it may not be an issue, but once I have a cluster made up of nothing but cold nodes will I have problems?
is it more a pirority system so if all that is avaliable is low priority then low is what will be used or is it more data goes into a hot and then gets moved to cold when it's not being used any more so you need to have a hot node there to take stuff and I should leave setting this till those are in place?
Elasticsearch doesn't make any fundamental distinction between "cold nodes" and any other node. It's more about using user-defined attributes and user-defined allocation filters to allocate shards across the different tiers as appropriate for the differences in their hardware. But you get to choose what the attributes are and what filters to apply. So no, there is no real problem with declaring every node in your cluster to be cold.
However, I'm curious about how you plan on combining your two clusters into one. Moving nodes from one cluster to another is unsafe and seriously risks losing data. If you are on 7.x you cannot move nodes into a different cluster for this reason. Earlier versions did allow this but you should consider this a bug not a feature and definitely avoid relying on it. Instead, I recommend adding new empty nodes to the target cluster and then using snapshot/restore or reindex-from-remote to import the data from the other cluster.
I did a little extra reading and it looks like there is a lot more at play to be able to use it, I'm guessing the attribute it much like when you say move all shards to node 3 when you want to do a shrink on an index except it's a collection of nodes rather than one that is explicity named.
As for the migration I wont' be moving the nodes from cluster A into cluster B. The hardware on cluster A is going to be refreshed so the refreshed hardware will be added to cluster B while cluster A continues to run in parallel.
With the nodes online I will then suspend the system that is using elastic and re-index or as you suggest doing snapshots for all the data in the indexes needed for normal operations over a weekend before resuming normal operations and then migrating over the remainder of the indexes before powering down cluster A. With snapshotting I'm thinking it could be faster for me as I could get a pair of USB drives snapshot to a USB device before moving it the target to restore while shapshoting again to the second drive and then roatating them round till I've got atleast as much as I need transfered to bring things back online.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.