New warm nodes are getting filled fast

Hi All,

In the beginning, we had 4 data nodes. Two were ilm designated hot nodes and the other two were warm nodes. Newly created indices would be in the hot nodes under the hot phase and after 4 weeks of retention, would move to the warm nodes under the warm phase.

Recently we added two more warm nodes to the cluster. But what we have been noticing is that the indices are prominently populating these new warm nodes instead of getting evenly distributed across all the 4 warm nodes. How can I make the indices to stop populating only the new warm nodes?

Can you share your ILM policy and the config from the warm nodes?

We don't have a set ILM policy as of now, because the naming for the majority of our indices are static and not dynamic. So we manually run the command

PUT *2021.<week_number>/_settings
{
  "index.routing.allocation.require.data": "warm"
}

every week to rollover indices from hot to warm phase.

Following is the config of our new warm node:

cluster.name: "**"
node.name: "data-4"
path.logs: /var/log/elasticsearch
path.data: /datadisks/disk2/elasticsearch/data
discovery.zen.ping.unicast.hosts: ["master-0:9300","master-1:9300","master-2:9300"]
node.master: false
node.data: true
node.attr.data: warm
discovery.zen.minimum_master_nodes: 2
network.host: [site, local]
node.max_local_storage_nodes: 1
#node.attr.fault_domain:
#node.attr.update_domain:
#cluster.routing.allocation.awareness.attributes: fault_domain,update_domain
#xpack.license.self_generated.type: trial
xpack.security.enabled: true
bootstrap.memory_lock: false

xpack.security.authc.token.enabled: true

xpack.security.authc.realms.native1:
type: native
order: 0

xpack.security.authc.realms.saml1:
type: saml
order: 2
idp.metadata.path: saml/idp-external.xml
idp.entity_id: "^^^"
sp.entity_id: ""
sp.acs: "^^^"
sp.logout: "
"
attributes.principal: "***"
attributes.groups: "http://schemas.microsoft.com/ws/2008/06/identity/claims/groups"

xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.key: ssl/data-4.key
xpack.security.http.ssl.certificate: ssl/data-4.crt
#xpack.security.http.ssl.key_passphrase: ***

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
xpack.notification.email.account:
standard_account:
profile: standard
smtp:
auth: false
starttls.enable: false

It looks like you have a license above Basic, based on your use of Watcher and AD in Security, so I would encourage you to reach out your Support contact about this as well.

What's the output from;

GET /_cat/allocation?v
GET /_cat/nodeattrs?v
shards disk.indices disk.used disk.avail disk.total disk.percent host        ip          node
   830          1tb     1.1tb    804.6gb      1.9tb           60 10.40.10.58 10.40.10.58 cle-data-3
   830          1tb     1.1tb    798.2gb      1.9tb           60 10.40.10.57 10.40.10.57 cle-data-2
   830        1.4tb     1.5tb    383.6gb      1.9tb           80 10.40.10.59 10.40.10.59 cle-data-5
  1005        1.3tb     1.4tb    482.7gb      1.9tb           76 10.40.10.8  10.40.10.8  cle-data-0
   830        1.5tb     1.6tb    344.5gb      1.9tb           82 10.40.10.60 10.40.10.60 cle-data-4
  1005        1.3tb     1.5tb    479.5gb      1.9tb           76 10.40.10.6  10.40.10.6  cle-data-1

cle-master-2 ml.machine_memory 3608973312
cle-master-2 ml.max_open_jobs  20
cle-master-2 xpack.installed   true
cle-master-2 ml.enabled        true
cle-data-2   ml.machine_memory 14706561024
cle-data-2   ml.max_open_jobs  20
cle-data-2   xpack.installed   true
cle-data-2   ml.enabled        true
cle-data-2   data              warm
cle-data-1   ml.machine_memory 59094614016
cle-data-1   ml.max_open_jobs  20
cle-data-1   xpack.installed   true
cle-data-1   ml.enabled        true
cle-data-1   data              hot
cle-data-3   ml.machine_memory 14706561024
cle-data-3   ml.max_open_jobs  20
cle-data-3   xpack.installed   true
cle-data-3   ml.enabled        true
cle-data-3   data              warm
cle-data-4   ml.machine_memory 14677934080
cle-data-4   ml.max_open_jobs  20
cle-data-4   xpack.installed   true
cle-data-4   ml.enabled        true
cle-data-4   data              warm
cle-data-5   ml.machine_memory 14677934080
cle-data-5   ml.max_open_jobs  20
cle-data-5   xpack.installed   true
cle-data-5   ml.enabled        true
cle-data-5   data              warm
cle-master-0 ml.machine_memory 3608965120
cle-master-0 ml.max_open_jobs  20
cle-master-0 xpack.installed   true
cle-master-0 ml.enabled        true
cle-master-1 ml.machine_memory 3608973312
cle-master-1 ml.max_open_jobs  20
cle-master-1 xpack.installed   true
cle-master-1 ml.enabled        true
cle-data-0   ml.machine_memory 59094618112
cle-data-0   ml.max_open_jobs  20
cle-data-0   xpack.installed   true
cle-data-0   ml.enabled        true
cle-data-0   data              hot

Just a note, you have way too many shards for your data size and are likely overloading your nodes. You should shrink some of your indices if you can.

However looking at that output I can see;

cle-data-2   data              warm
cle-data-3   data              warm
cle-data-4   data              warm
cle-data-5   data              warm

And;

shards disk.indices disk.used disk.avail disk.total disk.percent host        ip          node
   830          1tb     1.1tb    804.6gb      1.9tb           60 10.40.10.58 10.40.10.58 cle-data-3
   830          1tb     1.1tb    798.2gb      1.9tb           60 10.40.10.57 10.40.10.57 cle-data-2
   830        1.4tb     1.5tb    383.6gb      1.9tb           80 10.40.10.59 10.40.10.59 cle-data-5
   830        1.5tb     1.6tb    344.5gb      1.9tb           82 10.40.10.60 10.40.10.60 cle-data-4

So all of those nodes have the same shard count on them. It does look like the shard sizes are different though, which would account for what you are seeing.

However ES balances by shard count first, then if it starts to hit disk watermarks then it will move things around as needed.

Earlier the watermark levels were 10 and 20gb respectively for low and high. I then changed it to the following:

> transient" : {
>     "cluster" : {
>       "routing" : {
>         "rebalance" : {
>           "enable" : "primaries"
>         },
>         "allocation" : {
>           "disk" : {
>             "threshold_enabled" : "true",
>             "watermark" : {
>               "low" : "150gb",
>               "flood_stage" : "10gb",
>               "high" : "100gb"
>             }

But despite this, the free disk spaces on two of the warm nodes out of 4 have reduced to as low as 14gb and 65gb. Why is this happening? Why didn't elasticsearch stop pushing data to these two nodes once they reached the 100gb threshold?

Which nodes have reached the 100GB threshold? According to GET _cat/allocation, all nodes have over 300GB of free space.

Hi David

The GET /_cat/allocation output that I had posted is a couple of weeks old. Today it had reached 1gb and 65gb. We have done some cleanup work now and the free disk space has risen to around 100gb, so no use providing the current allocation output.

Any further advise?

Would be good to see diagnostic info from when the problem was occurring, but if the problem is not occurring any more then there's not a lot we can do.

If it happens again then obtain GET _cat/allocation and also run the cluster allocation explain API on one or more shards on the node that's too full. Logs from the master at the same time might be useful too.

The problem still exists David. The free disk space has increased because we have manually deleted indices. Otherwise, it would have reached 0gb by now.

Sure, when the free disp space of the nodes fall below 100gb, I will post the allocation output and provide the logs from Master

Hi David,

Is there a way to temporarily halt elasticsearch from allocating shards to a particular node that is space constrained so that it can concentrate on other nodes?

> shards disk.indices disk.used disk.avail disk.total disk.percent               node
>    523        1.2tb     1.3tb    581.6gb      1.9tb           71                          cle-data-5
>    523          1tb     1.1tb      811gb      1.9tb           59                             cle-data-3
>    522        1.2tb     1.3tb      587gb      1.9tb           70                            cle-data-4
>    502        1.7tb     1.8tb    102.3gb      1.9tb           94                          cle-data-2

As you can see, cle-data-2 has just 102.3gb whereas other nodes have a lot. So how can I halt elasticsearch allocation on cle-data-2 temporarily?

Yes, that happens by default when the free space reaches the high watermark. I think it's already doing its thing, given that this node has fewer shards than the others. Would you supply the other diagnostic info I asked for above (allocation explain for a shard on the overfull node, plus master logs)?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.