DataNode disk full, despite on flood_stage configuration

gidonshn · December 5, 2022, 12:37pm

Hi All.

I have a cluster (7.16.2) running on k8s, with multiple DataNodes, dedicated ClientNodes, and Dedicated MasterNodes.
Some of the DataNodes are "hot" data nodes, and some of them are "warm".
I have daily indices created and all of them are managed using index templates and index lifecycles (hot->warm->delete).

My watermarks are configured as default:
cluster.routing.allocation.disk.watermark.low=85%
cluster.routing.allocation.disk.watermark.high=90%
cluster.routing.allocation.disk.watermark.flood_stage=95%

Some of my indices have replica shards and some do not.

One of the "hot" DataNodes (which is a pod) got to 100% disk usage and couldn't rejoin the cluster.
How can that happen?
Shouldn't the flood_stage parameter mark the indices with shards on the node as "ReadOnly" and stop all writes to the disk?
Is there a configuration I somehow could have missed or misconfigured?

Thanks!

warkolm · December 6, 2022, 5:36am

Welcome to our community!

How much disk space do your nodes have? Cause yes, it should stop writing, but there's always a bit of wiggle room due to things like merges or reallocation that may happen. So if your node has 5GB at 95% and your indices are 10's of GB in size then that may explain it. But I am making a guess there without more info.

Rios · December 6, 2022, 7:24am

It doesn't mean only ES fill your system. Might be others: paging, logs, processes.. as well.

gidonshn · December 6, 2022, 7:31am

Hi @warkolm
Thanks for the welcome, and for the quick response.
My "hot" nodes have 7TB of disk each (persisstent volume connected to the pod).
So 5% of that is 350GB..

gidonshn · December 6, 2022, 7:40am

Hi @Rios
Thanks for the response.
The FS that was filled up is used only for ES data since it's a persistent volume being mounted by the pod directory to the data directory.
The mount point is:
/usr/share/elasticsearch/data

system · January 3, 2023, 7:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple disks on same node - shard allocation Elasticsearch	4	2883	May 9, 2020
High flood state disk watermark results in read only indexes Elasticsearch	3	5143	March 31, 2018
Watermark floodstage not stopping usage of diskspace Elasticsearch	7	335	September 15, 2020
Problems with flood stage watermark setting Elasticsearch	3	2721	April 13, 2019
Index.blocks.read_only_allow_delete becomes true even if water mark is reached in one of the node in the cluster Elasticsearch	7	1266	April 24, 2019

DataNode disk full, despite on flood_stage configuration

Related topics