Curator: Delete oldest indices based on ES cluster size

Not sure if this is a how to or a feature request;

I have a requirement to keep as much time series data as possible for our central logging system. Our log event rates throughout the year vary wildly. If we delete indices just over a certain age are not really a good solution to this problem, nor is deleting indices over a certain size, as some months can be larger than others.

We have reached capacity now and I typically manually delete the oldest indices to keep the cluster inside a size threshold. I've calculated my safe threshold as 25.5 TB of 40 TB total storage to allow for shard reallocation and maintaining performance on spinning disk.

The filter I envision would get the current size of the cluster and compare that value to the threshold data size in the filter. If the cluster size is greater than the threshold; then based on a index name and date pattern filter, working from the oldest and largest indices, determine what indices need to be deleted to ensure the cluster remains beneath the safe threshold .

My goal is to use Curator; a consistent and feature rich tool to do this. Upon reading the documentation, I could not see any configuration settings that could be used to do cluster size disk limits. In my Google searching; there was an issue on github which is now closed that matched my use case, perhaps this functionality is now built in but I'm unable to see how to actually accomplish my requirements.

Delete old indices when disk space limit has been exceeded for all cluster #573

Can anyone shed some light on this for me? I've got a batch script that will prune the older indices with size comparison in the mean time, and that's fine. I would prefer to leverage some of the other features of Curator and having a single tool to accomplish all the things would be ideal, I'm hoping I'm just not understanding how to accomplish this with the appropriate filters.

2 Likes

An action file like this should suffice (if all indices match a given prefix):

---
actions:
  1:
    action: delete_indices
    description: >-
      Delete indices matching the prefix MY_INDEX_PREFIX_HERE in excess of 
      25.5TB of data, starting with the oldest indices, based on index creation_date. 
      An empty index list (from no indices being in excess of the size limit, for 
      example) will not generate an error.
    options:
      ignore_empty_list: True
      timeout_override: 300
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: MY_INDEX_PREFIX_HERE
    - filtertype: space
      disk_space: 25500
      use_age: True
      source: creation_date

I suggest running Curator with the --dry-run flag to see the output before running it against your production database. This set of filters will first filter by index prefix (this will prevent accidentally deleting your .kibana index, for example), and then filter by the amount of disk space consumed, sorted by age (using the index creation_date as the time stamp), so the oldest indices are deleted first. You could also use source: name, if your indices have a date stamp in the index name. The documentation for the space filtertype should help you select a different means of determining age, if desired.

1 Like

Thank you @theuntergeek, that's seems to return exactly what I asked for. I need to do a bit of trickery to do a regex pattern to not match yearly indices.

Do your yearly indices have the same prefix as your monthly indices? That makes things a bit more difficult.

You could theoretically filter these with a pattern filter using kind: regex. You'd have to have a specific regex for yearly-only indices, e.g. value: '^mypattern-\d{4}$' in conjunction with exclude: True. This will exclude indices which start with mypattern- and end with 4 digits, and that's all. No . followed by a month number.

Be sure to encapsulate the regex in single quotes, as above.

Your filter block would now look like:

    filters:
      - filtertype: pattern
        kind: prefix
        value: 'mypattern-'
      - filtertype: pattern
        kind: regex
        value: '^mypattern-\d{4}$'
        exclude: True
      - filtertype: space
        disk_space: 2550
        source: creation_date
        use_age: True

As always, use --dry-run first to ensure it properly excludes the yearly indices

I did manage to get them matching correctly using something like below. However, one thing I didn't consider is that I've replayed some of our logs when we've changed logstash filters and wanted additional enrichment data, which means the creation date doesn't line up with actual contents of data in the index. I'm currently trying to get the space filter to use the timestring for determining age, but not having much luck.

    filters:
    - filtertype: pattern
      kind: regex
      value: '^mypattern.*-[0-9]{4}\.[0-9]{2}$'
    - filtertype: space
      disk_space: 25500
      use_age: True
      source: creation_date

In that case, you could use source: field_stats with field: '@timestamp' in your age filter. This can be used to determine the min (default) or max value of a time stamp field for your indices. It uses the field_stats API. Read more in the Curator documentation.

I should have said, "In your space filter"

I tried the field_stats but not getting any matches based on that :thinking:

    filters:
      - filtertype: pattern
        kind: regex
        value: '^simpleprefix-.*$'
      - filtertype: space
        disk_space: 25500
        use_age: True
        source: field_stats
        field: '@timestamp'
        stats_result: max_value

I've also had no luck using the name source and using the timestring to determine index age from the name FWIW.

I'd be interested in seeing some of the DEBUG output to see why. It should show the dates and the point of reference in the logs with loglevel: DEBUG

Hi @theuntergeek,

Thanks again for your clarification. I ended up getting both source: name and source: field_stats approaches working this morning. I'm guessing because I had the disable_action: true set in my original action file which is what prevented the actual actions from working at all. Just to document my working examples for other people:

  • Delete oldest indices using filtertype: space and source: name
actions:
  1:
    action: delete_indices
    description: >-
      Delete monthly indices with timestring suffix, matched with the regex 
      MYPREFIX in excess of 25.5TB of data, starting with the oldest indices, 
      based on index timestring suffix.
      An empty index list (from no indices being in excess of the size limit, 
      for example) will not generate an error.
    options:
      ignore_empty_list: True
      timeout_override: 300
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '^MYPREFIX.*-\d{4}\.\d{2}$'
    - filtertype: space
      disk_space: 25500
      use_age: True
      source: name
      timestring: '%Y.%m'
  • Delete oldest indices using filtertype: space and source: field_stats
actions:
  1:
    action: delete_indices
    description: >-
      Delete monthly indices, matched with the regex MYPREFIX in excess of 
      25.5TB of data, starting with the oldest indices, based on field_stats 
      @timestamp field max_value.
      An empty index list (from no indices being in excess of the size limit, 
      for example) will not generate an error.
    options:
      ignore_empty_list: True
      timeout_override: 300
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: regex
      value: '^MYPREFIX.*-\d{4}\.\d{2}$'
    - filtertype: space
      disk_space: 25500
      use_age: True
      source: field_stats
      field: '@timestamp'
      stats_result: max_value

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.