Hi Everyone,
We've been having some issues with our Elasticsearch cluster hitting 100% disk usage on one or more nodes.
It is a three node cluster with two master/data nodes and one voting-only node.
The nodes have a PV created by the Local Volume Provisioner storage class.
[root@k8s02 storage]# df -h /var/elasticsearch/storage/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/almalinux-elastic 1019G 677G 342G 67% /var/elasticsearch/storage
[root@k8s02 storage]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-7db3cb5 969Gi RWO Delete Bound default/elasticsearch-data-elastic-es-master-0 local-storage 170d
local-pv-9a836200 969Gi RWO Delete Bound default/elasticsearch-data-elastic-es-master-1 local-storage 170d
We have about 10+ clusters in production. In every other instance, when hitting the flood stage all the indices get the read_only_allow_delete
option set accordingly, except for this one.
There are no notable logs on any of the ELK stack components, nor on the OS of the nodes running Kubernetes. This has happened to this cluster a few times already, forcing us to increase the disk space each time.
Does anyone have any clue on what the reason for that could be?
Here are all the cluster settings the node has configured:
[2025-01-03T09:36:57,957][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.node_concurrent_incoming_recoveries] from [2] to [4]
[2025-01-03T09:36:57,957][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [94%]
[2025-01-03T09:36:57,958][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [97%]
[2025-01-03T09:36:57,958][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.high.max_headroom] from [150GB] to [-1]
[2025-01-03T09:36:57,958][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.flood_stage.max_headroom] from [100GB] to [-1]
[2025-01-03T09:36:57,958][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.max_shards_per_node] from [1000] to [3000]
[2025-01-03T09:36:57,959][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.low] from [85%] to [92%]
[2025-01-03T09:36:57,959][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.low.max_headroom] from [200GB] to [-1]
[2025-01-03T09:36:57,959][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [94%]
[2025-01-03T09:36:57,959][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.high.max_headroom] from [150GB] to [-1]
[2025-01-03T09:36:57,959][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [97%]
[2025-01-03T09:36:57,959][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.flood_stage.max_headroom] from [100GB] to [-1]
[2025-01-03T09:36:57,960][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.low] from [85%] to [92%]
[2025-01-03T09:36:57,960][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.low.max_headroom] from [200GB] to [-1]
[2025-01-03T09:36:57,960][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [94%]
[2025-01-03T09:36:57,960][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.high.max_headroom] from [150GB] to [-1]
[2025-01-03T09:36:57,960][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [97%]
[2025-01-03T09:36:57,960][INFO ][o.e.c.s.ClusterSettings ] [elastic-es-master-1] updating [cluster.routing.allocation.disk.watermark.flood_stage.max_headroom] from [100GB] to [-1]
Any help is greatly appreciated!
Cheers,
Luka