I am trying to utilize the _tier_preference setting of "data_hot,data_warm" so that all of my new indexes primary goes onto the HOT node and the replica for that index goes onto the WARM node.
However it ups the data on primary hot node but then goes into yellow state because it can't allocate the replica.
index has a preference for tiers [data_hot,data_warm] and node does not meet the required [data_hot] tier"
The node it is selecting does infact not have data_hot but it does have data_warm. Why is it not going to the next tier.
The documentation seems to say it should be allocating the next available node that meets the tier criteria
I don't think this is possible, this behavior does not make sense when using data tiers.
When you set _tier_preference to data_hot,data_warm you are telling elasticsearch that nodes with the data_hot role have preference to allocate shards for the indice, but if there are no nodes with the data_hot role in the cluster, then the shards should be allocate on nodes with the data_warm role.
If you have both roles in the cluster, then it will follow the preference order for both primary and replicas, you cannot have a primary shard on a data_hot node and a replica shard on a data_warm node.
Also, the reference post you shared is not the same thing you want to do.
Setting _tier_preference to null basically disables it and will allow the shards to be allocated on any node.
To remove the data tier preference setting, set the _tier_preference value to null . This allows the index to allocate to any data node within the cluster. Setting the _tier_preference to null does not restore the default value.
Can you provide more context on why you want to have a primary on a hot node and a replica on a warm node? As mentioned, this does not make sense when using data tier as hot nodes and warm nodes are expected to have difference hardware configurations for different goals.
The primary goal is that I want all new data coming in to be on the hot SSD storage node and the replica for those indexes on the warm node. I only have two nodes in my cluster.
The idea is that all new data is fast to query but is on the limited size SSD but has a back up on the warm larger HDD raid cluster.
I want to ensure there is always a replica but that it can be placed on the warm node.
To use data tiers and have replicas you need at least 2 nodes of the same tier, so you would need 2 hot nodes and 2 warm nodes.
With 2 nodes only, you should configure both nodes to have the same data tiers.
Also, a 2 node cluster is not resilient to failures, so sometimes it does not make much sense on having replicas and assume the risk of this. You can read more about resilience in small clusters on this documentation.
In your case what you could do is to have a hot and warm node, but both without replicas, then you could have newer data on the hot tier and older data on the warm tier.
To have replicas in both cases you would need an extra hot node with the same specs and an extra warm node as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.