Hi,
I have a cluster with 10 data nodes. some data nodes have multiple disks with different sizes (running on-prem, have to do with what i'm given).
During the night my main index got locked due to 'flood stage disk watermark'. My flood stage limit is configured to be 50gb, but when i checked the nodes in Kibana / 'GET /_cat/nodes' I couldn't see any node lacking disk space.
At the end it turned out that a single disk on one of my machines got filled (99%). This machine has two disks (600GB, 3.5TB) and the smaller one got filled. I think this behavior is a bit strange as i have no control over what shard goes to which disk, and the rest of the cluster had plenty of disk space. i would expect an ES node to know how to manage its disks and not lock the entire index/cluster when this could've been avoided.
My only option now, if i want to use this extra disk, is to have 2 ES instances on this machine (1 per disk), which is a waste of RAM and CPUs.
any thoughts/ ideas on that? have i missed something? should i open a bug?
Yes, Elasticsearch should have avoided this by relocating shards, assuming this was allowed and it had time to do so in between breaching the high and flood-stage watermarks. If you ingest data too quickly then it's possible you can go from below the high watermark (no action needed) to above the flood-stage watermark (index marked as readonly) before it can react. Without more detail on exactly why Elasticsearch got into this state I don't think it makes sense to open a bug report, however, since I don't think I can reproduce this from what you've said so far.
From 7.4.0 the readonly block is automatically removed when disk space is freed which might have helped.
You're right that there's a bit of extra overhead involved in running multiple ES instances but it's not normally much -- the multiple instances are typically smaller than the single one. Another option is to use RAID or LVM or similar to combine your disks into a single filesystem.
My configuration is 50gb/200gb/300gb for flood/high/low. i have a single index being written to and many other read only indices (using rollover API). As i said before i have 10 data nodes in the cluster with varying disk sizes - from 1.5TB to 4.3TB. The "big" machines have multiple disks. all shards are ~50gb (give or take 1gb).
monitoring is also enabled on this cluster so i have lots of system indices (kibana, es, logstash).
I have not modified any cluster configuration other than the limits stated above. i did notice the disk getting filled is the first one in configuration array. This issue doesn't seem to happen on other machines with multiple disks when disks are the same size.
is there any other information i can provide? also, are there any recommendation for such setups (single index rolling over, many data nodes with varying disk sizes)?
Ideally I'd like to see the shard allocation and recovery stats (e.g. GET _cat/shards and GET _cat/recovery) taken periodically for some time before/during the disk-full event, and maybe some output from the allocation explain API explaining why no shards could be relocated.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.