DataNode disk full, despite on flood_stage configuration

Hi All.

I have a cluster (7.16.2) running on k8s, with multiple DataNodes, dedicated ClientNodes, and Dedicated MasterNodes.
Some of the DataNodes are "hot" data nodes, and some of them are "warm".
I have daily indices created and all of them are managed using index templates and index lifecycles (hot->warm->delete).

My watermarks are configured as default:
cluster.routing.allocation.disk.watermark.low=85%
cluster.routing.allocation.disk.watermark.high=90%
cluster.routing.allocation.disk.watermark.flood_stage=95%

Some of my indices have replica shards and some do not.

One of the "hot" DataNodes (which is a pod) got to 100% disk usage and couldn't rejoin the cluster.
How can that happen?
Shouldn't the flood_stage parameter mark the indices with shards on the node as "ReadOnly" and stop all writes to the disk?
Is there a configuration I somehow could have missed or misconfigured?

Thanks!

Welcome to our community! :smiley:

How much disk space do your nodes have? Cause yes, it should stop writing, but there's always a bit of wiggle room due to things like merges or reallocation that may happen. So if your node has 5GB at 95% and your indices are 10's of GB in size then that may explain it. But I am making a guess there without more info.

It doesn't mean only ES fill your system. Might be others: paging, logs, processes.. as well.

Hi @warkolm
Thanks for the welcome, and for the quick response.
My "hot" nodes have 7TB of disk each (persisstent volume connected to the pod).
So 5% of that is 350GB..

Hi @Rios
Thanks for the response.
The FS that was filled up is used only for ES data since it's a persistent volume being mounted by the pod directory to the data directory.
The mount point is:
/usr/share/elasticsearch/data

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.