Disk quota on Elasticsearch for indices from different sources?

Mike_Z · April 10, 2017, 8:47pm

We have a self-hosted Elasticsearch cluster, accepting log feeds from multiple sources. In a recent incident, issues on one single source caused entire cluster failure and data loss to all the other sources.

Due to a Logstash configuration error, one feeding source suddenly went crazy by sending a huge amount of data within a short period. We have a very big disk buffer, but it was still eaten up. All the nodes got a disk full, and the cluster’s functionality came into a halt. It took some time to fix the problem, and incoming data is lost during the gap.

Feels like, a single point of issue should not fail the entire cluster, and we want to look into possibly restricting total disk usage of indices from each source.

On our cluster, each source creates one index per day, with naming pattern source_name-YYYY-MM-DD, and we can get total disk usage per source by command “du -chs /indices/source_name-* | grep total”.
Tentatively thinking, maybe we can have a watchdog script to check it, and close or delete the indices that exceed the quota. I am wondering if there is any existing tool for it?

I am also wondering how Elasticsearch Cloud handles disk usage issue, assuming there are also multiple clients on their shared host. Is it deploying one virtual machine for each client or something similar?

Feedbacks and pointers will be highly appreciated.

warkolm · April 11, 2017, 8:24am

I haven't seen something for this specific use case. Generally retention is cluster/node wide.

Mike_Z · April 11, 2017, 3:15pm

Thank you for your input, Mark.

I am also wondering how the cloud service is handling the disk space issue, assuming there are also multiple clients on the shared host. Is it deploying one virtual machine for each client or something similar?

More details would be greatly helpful to me.

Mike_Z · April 11, 2017, 5:48pm

Found a related post on GitHub, including the link below just for your reference.

warkolm · April 11, 2017, 10:47pm

Cloud does it per node, which is a container. So really it's handled as it would be for any other node.

Mike_Z · April 12, 2017, 9:48pm

Thank you for your help, Mark.

With multiple feeding sources, is it more popular to use separated nodes facing each source respectively? We have security requirements. If change to this direction, is there a way to build an internal cloud service, by Docker or what?

Or, is there a product for this usage?

warkolm · April 13, 2017, 3:42am

Elastic Cloud Enterprise does this use case very well.

However you could DIY with docker too.

system · May 11, 2017, 3:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic disk usage for a beginner Elasticsearch	4	771	February 5, 2018
Way to limit # of documents or storage size per node Elasticsearch	4	1672	July 6, 2017
Disk quota isolation for elasticsearch Elasticsearch	1	554	November 10, 2019
How to delete older logs in ELK to give each application a certain disk quota? Elasticsearch	7	3166	July 5, 2017
Non-Uniform Drive Space Across Nodes Elasticsearch	6	1608	July 6, 2017

Disk quota on Elasticsearch for indices from different sources?

Related topics