I have a cluster (7.16.2) running on k8s, with multiple DataNodes, dedicated ClientNodes, and Dedicated MasterNodes.
Some of the DataNodes are "hot" data nodes, and some of them are "warm".
I have daily indices created and all of them are managed using index templates and index lifecycles (hot->warm->delete).
My watermarks are configured as default:
Some of my indices have replica shards and some do not.
One of the "hot" DataNodes (which is a pod) got to 100% disk usage and couldn't rejoin the cluster.
How can that happen?
Shouldn't the flood_stage parameter mark the indices with shards on the node as "ReadOnly" and stop all writes to the disk?
Is there a configuration I somehow could have missed or misconfigured?
Welcome to our community!
How much disk space do your nodes have? Cause yes, it should stop writing, but there's always a bit of wiggle room due to things like merges or reallocation that may happen. So if your node has 5GB at 95% and your indices are 10's of GB in size then that may explain it. But I am making a guess there without more info.
It doesn't mean only ES fill your system. Might be others: paging, logs, processes.. as well.
Thanks for the welcome, and for the quick response.
My "hot" nodes have 7TB of disk each (persisstent volume connected to the pod).
So 5% of that is 350GB..
Thanks for the response.
The FS that was filled up is used only for ES data since it's a persistent volume being mounted by the pod directory to the data directory.
The mount point is:
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.