ES is allocating indexes that are hot to nodes in the cold tier

Russell_Fulton · January 11, 2023, 9:52pm

I can't figure out how to diagnose what is happening here.

I am using a hot/warm/cold data tier model but ES keeps allocating indexes to my overloaded cold tier when they are still in the hot phase (being actively written to with high data rates) which is causing a number of performance issues.

The nodes in the hot tier have over a TB free space and the cold tier nodes are being pushed over their high watermark at which point ES then moves things around but discards ingested traffic while it does it

warkolm · January 11, 2023, 9:54pm

Please share more information, we cannot provide much help with what you've provided here.

leandrojmp · January 11, 2023, 10:40pm

What is the version of your cluster?

Are you using Elastic cloud or self-managed?

What are the roles of your hot, warm and cold nodes?

Do you have any allocation setting on your indices templates?

Russell_Fulton · January 11, 2023, 11:06pm

Thanks Mark, I was not asking for solutions, I was asking for help diagnosing the problem. i.e. pointers of things to look at to get a handle on the problem. I really don't know where to start.

I can pick one index -- what data would be useful?

Russell_Fulton · January 12, 2023, 12:06am

OK. on prem es version 17.7.1

I have two "hot" nodes with identical configuration:
node.roles: [master, ingest, data, data_hot, data_warm, data_cold]
only on warm node (yes I want another)
node.roles: [ "master", "data", "data_warm" ],
two cold nodes
node.roles: [ "data", "data_cold" ],

one of the indexes that is having problems has these setting:

{ - 
  ".ds-sec-events-2023.01.03-000005": { - 
    "settings": { - 
      "index": { - 
        "lifecycle": { - 
          "name": "sec-events-policy"
        },
        "routing": { - 
          "allocation": { - 
            "include": { - 
              "_tier_preference": "data_hot"
            }
          }
        },
        "hidden": "true",
        "number_of_shards": "2",
        "provided_name": ".ds-sec-events-2023.01.03-000005",
        "creation_date": "1672786115725",
        "priority": "100",
        "number_of_replicas": "1",
        "uuid": "gLrEL3jRQTuL2ZR6ocJIMA",
        "version": { - 
          "created": "7170199"
        }
      }
    }
  }
}

this index has 2 primary shards allocated to cold and warm nodes and the repicas on the hot nodes

Is ES refusing to allocate more than one shard to per node? i.e primary of one and replica of the other. If so I should reduce the shards to 1.
Obviously it won't allocate both the primary and the replica to the same node.

DavidTurner · January 12, 2023, 9:12am

Hi @Russell_Fulton, I think you are looking for the allocation explain API:

If you don't understand why a shard is allocated somewhere, this API will give you all the details. If you need help understanding the output, share it here.

Christian_Dahlqvist · January 12, 2023, 9:16am

All your nodes have the data role, which I believe means it can hold any type of data.

leandrojmp · January 12, 2023, 12:50pm

I think it is related to what @Christian_Dahlqvist said, you have the generic data role in your nodes.

The documentation does not help much in this case, it just says this:

In a multi-tier deployment architecture, you use specialized data roles to assign data nodes to specific tiers: data_content,data_hot, data_warm, data_cold, or data_frozen. A node can belong to multiple tiers, but a node that has one of the specialized data roles cannot have the generic data role.

It says that a node with a specialized data role cannot have the generica data role, but elasticsearch starts without any issue or warning about this If I'm not wrong, so it is not clear what will happen if you have both a specialized data role and the generic one, I would assume that generic one would take precedence and the specialized is ignored.

Also, you have a mixed node with data_hot, data_warm and data_cold, I'm not sure how this would work out as elasticsearch would try to balance the number of shards between the tiers and you have a node with multiple tiers.

The best way is to troubleshoot what is happening is to use the cluster allocation explain with the include_yes_decisions parameter.

Russell_Fulton · January 12, 2023, 6:30pm

Thanks! I have read and re read the docs around the data roles and come to different conclusions at different times : (

I know that when I initially added the cold nodes I did not have the data role, I changed that at some stage and now can't remember the reasoning. One of the big issues with the cluster as it is now is that I only have one warm node. I know I need two -- I have to have data on the hot nodes to allow somewhere for the replicas of the warm shards.

I will try removing data role from the two cold nodes -- I assume ES will then migrate off the non cold shards.

Thanks to all of you who responded and yes I will look at the explain api (again).

DavidTurner · January 12, 2023, 7:07pm

Just to emphasise this: there's lots you can do to second-guess the allocation rules if you have enough experience, but that's no help to most users. The allocation explain API is the first thing to try in cases like this. And to repeat: if it's difficult to understand the output then please ask for help. It'll help us improve it in future versions.

system · February 9, 2023, 7:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data tier problem - New Indices not allocating in data_hot Elasticsearch	4	496	September 22, 2021
Indices are not allocate in hot nodes Elasticsearch	4	436	March 27, 2021
High load issues on node (related to shard allocation) Elasticsearch	2	199	January 7, 2023
Cold phase staying in data/data_hot node Elasticsearch	3	680	February 22, 2021
Warm tier shards being allocated to data nodes Elasticsearch	2	187	June 21, 2023

ES is allocating indexes that are hot to nodes in the cold tier

Related topics