Delete after Total Disk Space Reaches Threshold


(Michael Li Zhou) #1

I found that curator can do many tasks but the delete --disk-space works in a very specific way. It will delete certain indices that are beyond a certain memory threshold. I want to ask if there is a way to check if the total disk space consumed by indices is reached then delete the oldest indices. Thanks.

Mike


(Aaron Mildenstein) #2

Curator does delete by space, by "oldest."

If I am trying to delete logstash indices beyond 10 gigabytes of consumption my command might be:

curator delete --disk-space 10 indices --prefix logstash

In trying to free up space, curator will delete the "oldest" logstash indices first.

Is the issue that you have multiple index name patterns? This curator cannot overcome. The "oldest" sorting is really only alphabetical, so some index names would be deleted first no matter what, preserving others.


(Aaron Mildenstein) #3

By the way, delete by space is a very tricky thing to get right in Elasticsearch as shard sizes may be (a little or very) unequal in your use-case. You may wind up with global disk space at the levels you want, but individual nodes may be much more full than others. This can be somewhat overcome by changing your Elasticsearch configuration to balance the shards by space consumed instead of shard count. This approach can have other, unintended consequences, though.

See the caveats listed in the Curator documentation.


(Michael Li Zhou) #4

Yeah I read though the caveats and only performance applies. But back to the problem I can totally use that command with some addition flags but what I am looking for is this:

(1) max of 8 indices per day
(2) delete indices older then X days (This may be a requirement plus is makes the task easier)

Use Case 1:
Say I have so far 3-5 days worth of data totaling to about 20 GB and my basic system has a limit or 22 GB. If the total indices size reach 21 GB I want to delete the oldest day of indices. Right now what I can work out is I will run 2 commands

(1) delete indices older then X days
(2) if any indices is greater then 2 GB (evenly distributing memory to each index) delete that are also older than X days

This will ensure that I will (1) will not go over limit and (2) delete indices that will not be looked at because they are X day old. Great problem solved.

But I want to optimize this process where if total index sizes reaches 21 GB I delete the last indexed day. The reason I don't want to just delete indices that go beyond 2 GB is because they all correlate if I delete 1 of the indices on August 9, 1015 then all other indices that day are irrelevant. Thanks.

Mike
Mike


(Aaron Mildenstein) #5

The Curator command-line tool is not able to correlate indices. You may be able to do something like what you've described, but you'd have to write it yourself (you could use the Curator API to aid in accomplishing this). Future versions of Curator may include the ability to do date math based on index creation time, but that will still not correct for index correlation. It would still only delete what was asked, which could leave some indices undeleted.


(system) #6