Disk quota on Elasticsearch for indices from different sources?

We have a self-hosted Elasticsearch cluster, accepting log feeds from multiple sources. In a recent incident, issues on one single source caused entire cluster failure and data loss to all the other sources.

Due to a Logstash configuration error, one feeding source suddenly went crazy by sending a huge amount of data within a short period. We have a very big disk buffer, but it was still eaten up. All the nodes got a disk full, and the cluster’s functionality came into a halt. It took some time to fix the problem, and incoming data is lost during the gap.

Feels like, a single point of issue should not fail the entire cluster, and we want to look into possibly restricting total disk usage of indices from each source.

On our cluster, each source creates one index per day, with naming pattern source_name-YYYY-MM-DD, and we can get total disk usage per source by command “du -chs /indices/source_name-* | grep total”.
Tentatively thinking, maybe we can have a watchdog script to check it, and close or delete the indices that exceed the quota. I am wondering if there is any existing tool for it?

I am also wondering how Elasticsearch Cloud handles disk usage issue, assuming there are also multiple clients on their shared host. Is it deploying one virtual machine for each client or something similar?

Feedbacks and pointers will be highly appreciated.

I haven't seen something for this specific use case. Generally retention is cluster/node wide.

Thank you for your input, Mark.

I am also wondering how the cloud service is handling the disk space issue, assuming there are also multiple clients on the shared host. Is it deploying one virtual machine for each client or something similar?

More details would be greatly helpful to me.

Found a related post on GitHub, including the link below just for your reference.

Cloud does it per node, which is a container. So really it's handled as it would be for any other node.

Thank you for your help, Mark.

With multiple feeding sources, is it more popular to use separated nodes facing each source respectively? We have security requirements. If change to this direction, is there a way to build an internal cloud service, by Docker or what?

Or, is there a product for this usage?

Elastic Cloud Enterprise does this use case very well.

However you could DIY with docker too.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.