ES is allocating indexes that are hot to nodes in the cold tier

I can't figure out how to diagnose what is happening here.

I am using a hot/warm/cold data tier model but ES keeps allocating indexes to my overloaded cold tier when they are still in the hot phase (being actively written to with high data rates) which is causing a number of performance issues.

The nodes in the hot tier have over a TB free space and the cold tier nodes are being pushed over their high watermark at which point ES then moves things around but discards ingested traffic while it does it

Please share more information, we cannot provide much help with what you've provided here.

What is the version of your cluster?

Are you using Elastic cloud or self-managed?

What are the roles of your hot, warm and cold nodes?

Do you have any allocation setting on your indices templates?

Thanks Mark, I was not asking for solutions, I was asking for help diagnosing the problem. i.e. pointers of things to look at to get a handle on the problem. I really don't know where to start.

I can pick one index -- what data would be useful?

OK. on prem es version 17.7.1

I have two "hot" nodes with identical configuration:
node.roles: [master, ingest, data, data_hot, data_warm, data_cold]
only on warm node (yes I want another)
node.roles: [ "master", "data", "data_warm" ],
two cold nodes
node.roles: [ "data", "data_cold" ],

one of the indexes that is having problems has these setting:

{ - 
  ".ds-sec-events-2023.01.03-000005": { - 
    "settings": { - 
      "index": { - 
        "lifecycle": { - 
          "name": "sec-events-policy"
        },
        "routing": { - 
          "allocation": { - 
            "include": { - 
              "_tier_preference": "data_hot"
            }
          }
        },
        "hidden": "true",
        "number_of_shards": "2",
        "provided_name": ".ds-sec-events-2023.01.03-000005",
        "creation_date": "1672786115725",
        "priority": "100",
        "number_of_replicas": "1",
        "uuid": "gLrEL3jRQTuL2ZR6ocJIMA",
        "version": { - 
          "created": "7170199"
        }
      }
    }
  }
}

this index has 2 primary shards allocated to cold and warm nodes and the repicas on the hot nodes

Is ES refusing to allocate more than one shard to per node? i.e primary of one and replica of the other. If so I should reduce the shards to 1.
Obviously it won't allocate both the primary and the replica to the same node.

Hi @Russell_Fulton, I think you are looking for the allocation explain API:

If you don't understand why a shard is allocated somewhere, this API will give you all the details. If you need help understanding the output, share it here.

All your nodes have the data role, which I believe means it can hold any type of data.

I think it is related to what @Christian_Dahlqvist said, you have the generic data role in your nodes.

The documentation does not help much in this case, it just says this:

In a multi-tier deployment architecture, you use specialized data roles to assign data nodes to specific tiers: data_content,data_hot, data_warm, data_cold, or data_frozen. A node can belong to multiple tiers, but a node that has one of the specialized data roles cannot have the generic data role.

It says that a node with a specialized data role cannot have the generica data role, but elasticsearch starts without any issue or warning about this If I'm not wrong, so it is not clear what will happen if you have both a specialized data role and the generic one, I would assume that generic one would take precedence and the specialized is ignored.

Also, you have a mixed node with data_hot, data_warm and data_cold, I'm not sure how this would work out as elasticsearch would try to balance the number of shards between the tiers and you have a node with multiple tiers.

The best way is to troubleshoot what is happening is to use the cluster allocation explain with the include_yes_decisions parameter.

Thanks! I have read and re read the docs around the data roles and come to different conclusions at different times : (

I know that when I initially added the cold nodes I did not have the data role, I changed that at some stage and now can't remember the reasoning. One of the big issues with the cluster as it is now is that I only have one warm node. I know I need two -- I have to have data on the hot nodes to allow somewhere for the replicas of the warm shards.

I will try removing data role from the two cold nodes -- I assume ES will then migrate off the non cold shards.

Thanks to all of you who responded and yes I will look at the explain api (again).

Just to emphasise this: there's lots you can do to second-guess the allocation rules if you have enough experience, but that's no help to most users. The allocation explain API is the first thing to try in cases like this. And to repeat: if it's difficult to understand the output then please ask for help. It'll help us improve it in future versions.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.