Understand tiering in Elastic

Hello,

I started playing with tiering on my Elastic cluster (in development of course). I have 5 nodes; the first 4 are hot_data anbd the fifth on is cold_data.

This was working fine.

Then, today I added a frozen_data node, but on the first starts of this new node, it was misconfigured as a cold_data node instead of frozen_data. When the node came online with the bad tier, data started to be moved to that new node. Then, I stopped the elasticsearch service, make the change to frozen_data, and restart it.

Since then, all shards that were previously on this node become "unassigned" shards, and after changing the new node form cold to frozen, those shards remains forever in a unassigned state.

I've tried to play with the API, but I'm always getting error such as this:

POST /_cluster/reroute?metric=none
{
  "commands": [
    {
      "allocate_replica": {
        "index": "ls-sys-win-ex-2024.05.24-002297",
        "shard": 0,
        "node": "elastic-dev01.domain.com"
      }
    }
  ]
}
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "[allocate_replica] allocation of [ls-sys-win-ex-2024.05.24-002297][0] on node {elastic-dev01.domain.com}{WXny64q1RqW6vFc-q3VnTQ}{AK5DlihTSbyf2CosZj7VRQ}{elastic-dev01.domain.com}{10.32.14.232}{10.32.14.232:9300}{hmrst}{8.11.1}{7000099-8500003}{xpack.installed=true, ml.config_version=11.0.0, transform.config_version=10.0.0} is not allowed, reason: [YES(shard has no previous failures)][YES(primary shard for this replica is already active)][YES(explicitly ignoring any disabling of allocation due to manual allocation commands via the reroute API)][YES(can allocate replica shard to a node with version [8.11.1] since this is equal-or-newer than the primary version [8.11.1])][YES(the shard is not being snapshotted)][YES(ignored as shard is not being recovered from a snapshot)][YES(no nodes are shutting down)][YES(there are no ongoing node replacements)][YES(node passes include/exclude/require filters)][YES(none of the nodes on this host hold a copy of this shard)][YES(enough disk for shard on node, free: [509.7gb], used: [85.7%], shard size: [0b], free after allocating shard: [509.7gb])][YES(below shard recovery limit of outgoing: [0 < 5] incoming: [0 < 5])][YES(total shard limits are disabled: [index: -1, cluster: -1] <= 0)][YES(allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it)][NO(index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier)][YES(shard is not a follower and is not under the purview of this decider)][YES(decider only applicable for indices backed by searchable snapshots)][YES(this decider only applies to indices backed by searchable snapshots)][YES(decider only applicable for indices backed by searchable snapshots)][YES(this node's data roles are not exactly [data_frozen] so it is not a dedicated frozen node)][YES(decider only applicable for indices backed by archive functionality)]"
      }
    ]

From that error, I can see this:
[NO(index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier)]

Looking at this particular shard gaves me this:
index shard prirep state docs stor

e dataset ip           node
ls-sys-win-ex-2024.05.24-002297 0     p      STARTED    329774 290.3mb 290.3mb 10.32.14.236 elastic-dev05.domain.com
ls-sys-win-ex-2024.05.24-002297 0     r      UNASSIGNED

We can see the its primary has been assigned to node elastic-dev05.domain.com and its replica remains UNASSIGNED. I tried to assign the replica using the above command to a data_hot node, but it does not allow me doing this.

Looking at the explain command, I have this:

{
  "index": "ls-sys-win-ex-2024.05.24-002297",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "at": "2024-05-27T13:47:11.350Z",
    "details": "node_left [kFcEvwAZRnW6qndhUc3JvA]",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
  "node_allocation_decisions": [
    {
      "node_id": "-DYuVZULTtuet3aJOaThIA",
      "node_name": "elastic-dev04.domain.com",
      "transport_address": "10.32.14.235:9300",
      "node_attributes": {
        "ml.max_jvm_size": "8258584576",
        "ml.config_version": "11.0.0",
        "xpack.installed": "true",
        "transform.config_version": "10.0.0",
        "ml.machine_memory": "16514121728",
        "ml.allocated_processors": "4",
        "ml.allocated_processors_double": "4.0"
      },
      "roles": [
        "data_content",
        "data_hot",
        "ingest",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier"
        }
      ]
    },
    {
      "node_id": "WXny64q1RqW6vFc-q3VnTQ",
      "node_name": "elastic-dev01.domain.com",
      "transport_address": "10.32.14.232:9300",
      "node_attributes": {
        "transform.config_version": "10.0.0",
        "ml.config_version": "11.0.0",
        "xpack.installed": "true"
      },
      "roles": [
        "data_content",
        "data_hot",
        "master",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier"
        }
      ]
    },
    {
      "node_id": "gcm_oacnQIq3vnbfuQbSnQ",
      "node_name": "elastic-dev02.domain.com",
      "transport_address": "10.32.14.233:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "ml.config_version": "11.0.0",
        "transform.config_version": "10.0.0"
      },
      "roles": [
        "data_content",
        "data_hot",
        "master",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier"
        }
      ]
    },
    {
      "node_id": "kFcEvwAZRnW6qndhUc3JvA",
      "node_name": "elastic-dev06.domain.com",
      "transport_address": "10.32.14.237:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "ml.config_version": "11.0.0",
        "transform.config_version": "10.0.0"
      },
      "roles": [
        "data_frozen",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "disk_threshold",
          "decision": "NO",
          "explanation": "the node is above the low watermark cluster setting [cluster.routing.allocation.disk.watermark.low=250gb], having less than the minimum required [250gb] free space, actual free: [13.3gb], actual used: [97.3%]"
        },
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier"
        },
        {
          "decider": "dedicated_frozen_node",
          "decision": "NO",
          "explanation": "this node's data roles are exactly [data_frozen] so it may only hold shards from partially mounted indices, but this index is not a partially mounted index"
        }
      ]
    },
    {
      "node_id": "lFGM7VADRpa1xKAMzBUDvg",
      "node_name": "elastic-dev05.domain.com",
      "transport_address": "10.32.14.236:9300",
      "node_attributes": {
        "ml.max_jvm_size": "8262778880",
        "ml.config_version": "11.0.0",
        "transform.config_version": "10.0.0",
        "xpack.installed": "true",
        "ml.machine_memory": "16514117632",
        "ml.allocated_processors": "4",
        "ml.allocated_processors_double": "4.0"
      },
      "roles": [
        "data_cold",
        "ingest",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[ls-sys-win-ex-2024.05.24-002297][0], node[lFGM7VADRpa1xKAMzBUDvg], [P], s[STARTED], a[id=x3fv589DSRiNcQcWe2WggA], failed_attempts[0]]"
        }
      ]
    },
    {
      "node_id": "xmBBBHHmQfytzQkyV6y04A",
      "node_name": "elastic-dev03.domain.com",
      "transport_address": "10.32.14.234:9300",
      "node_attributes": {
        "xpack.installed": "true",
        "ml.config_version": "11.0.0",
        "transform.config_version": "10.0.0"
      },
      "roles": [
        "data_content",
        "data_hot",
        "master",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier"
        }
      ]
    }
  ]
}

I've tried the allocate_replica on this node:`

    {
      "node_id": "WXny64q1RqW6vFc-q3VnTQ",
      "node_name": "elastic-dev01.domain.com",
      "transport_address": "10.32.14.232:9300",
      "node_attributes": {
        "transform.config_version": "10.0.0",
        "ml.config_version": "11.0.0",
        "xpack.installed": "true"
      },
      "roles": [
        "data_content",
        "data_hot",
        "master",
        "remote_cluster_client",
        "transform"
      ],
      "node_decision": "no",
      "deciders": [
        {
          "decider": "data_tier",
          "decision": "NO",
          "explanation": "index has a preference for tiers [data_cold,data_warm,data_hot] and node does not meet the required [data_cold] tier"
        }
      ]
    }

And we can see it requires a data_cold node, but I only have one and its primary shards already on it.

Is there a way to force a replica to bypass the tiers and write it on a hot_data node?

If you are using data tiering and want to still use replicas, you need at least 2 nodes on each tier.

I don't think so, if you change the tiering preference it will affect both primaries and replicas, you cannot have a tiering preference for primary and another one for replicas.

As mentioned, you need at least 2 nodes of the same tier, if you are planning to have just one node as cold_data, then you need to remove the replicas when moving the data to this tier.

1 Like

Hi @leandrojmp

Thank you very much for your help, I appreciate it.

I will deply a second cold node to solve my issue.

Regards!

Yanick