Curator with multiple actions isn't deleting all specified indices

I am using Curator 4.0.0 to manage a bunch of indices, and when I try to specify several filters in a single actions.yml, it seems to do something odd.

I have four things going into ES: WinLogBeat, FileBeat, MetricBeat and PacketBeat. I want different timeouts for some of these, so I have the following actions.yml:

actions:
  1:
    action: delete_indices
    options:
      continue_if_exception: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: winlogbeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: months
      unit_count: 1
  2:
    action: delete_indices
    options:
      continue_if_exception: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: filebeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: months
      unit_count: 1
  3:
    action: delete_indices
    options:
      continue_if_exception: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: metricbeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: months
      unit_count: 1
  4:
    action: delete_indices
    options:
      continue_if_exception: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: packetbeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 6

And if I run curator with that config, I get the "nothing to delete" error.

If I run it just with the last section, I correctly get a bunch of deleted indices:

  1:
    action: delete_indices
    options:
      continue_if_exception: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: packetbeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 6

Output:

2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.14 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.15 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.16 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.17 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.18 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.19 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.20 with arguments: {}
2017-01-03 13:14:57,796 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.21 with arguments: {}
2017-01-03 13:14:57,797 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.22 with arguments: {}
2017-01-03 13:14:57,797 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.23 with arguments: {}
2017-01-03 13:14:57,797 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.24 with arguments: {}
2017-01-03 13:14:57,797 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.25 with arguments: {}
2017-01-03 13:14:57,797 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.26 with arguments: {}
2017-01-03 13:14:57,797 INFO      DRY-RUN: delete_indices: packetbeat-2016.12.27 with arguments: {}

You should upgrade to the latest version, which is currently 4.2.5, and use the ignore_empty_list option.

Ah, I didn't realise I was using an older version. I had an existing 4.0.0 installed, so the package version I installed was being ignored. I'll give that a go now.

That worked. For what it's worth, it's not made particularly clear anywhere that if one step "fails" due to no matching indices, none of the following ones are executed. Perhaps a note on the main page about actions making it clear that they will only be executed if the preceding step did not fail, regardless of the "continue_if_exception" setting.
I realise it makes sense for many other operations, but I suspect many people use curator for pruning old log-based indices, and wouldn't even think of "nothing to do" as an error condition.

I apologize for the confusion. I will see about updating the documentation. It is tricky. "Nothing to do" might be an error. That's why the ignore_empty_list option was added.

Oh yes - it wasn't a criticism, just a suggestion :slight_smile: It's a bit confusing at the moment how continue_if_exception and ignore_empty_list work together.

If ignore_empty_list is false, it seems to result in an error for empty lists (as fits the description), but then "continue_if_exception" doesn't make it continue as I would expect it to.

This is what happens if there is an empty list:

ignore_empty_list continue_if_exception behaviour
false             false                 immediate abort, as expected
false             true                  immediate abort, but should continue because we chose to ignore this?
true              false                 continues, as there is no error. (I haven't tried this)
true              true                  empty list continues, as there is no error.

An empty list is a special condition, and it is treated differently, as you found out under unfortunate circumstances.

Suppose my filter block removes any indices from consideration. A good example of this is setting up Curator to purge daily indices after 30 days, but I only have two weeks so far because it's a brand new installation. The action cannot continue because there's nothing to act on. This is an error, but it's not a normally fatal error, like being unable to connect, or getting a failed response from Elasticsearch. This action should work at "day 31," but that's not a reason to fail now.

On the other hand, if I had continue_if_exception catch all errors, including empty list ones, then I might move on to the next action even though the previous one had failed. Suppose I wanted to optimize indices after a hot -> warm shard reallocation, but that failed because of some error reported by Elasticsearch (an example I heard last week was that there wasn't enough room on the warm nodes to accept the new shards). If continue_if_exception were true, then I would begin an I/O intensive optimize immediately after the failed relocation, and it would happen on my "hot" nodes, where I want every bit of I/O I can get.

This is why there are two separate exception types: one to catch only the empty-list exceptions, and one to catch everything else.

Cool, thanks for the examples. At the moment I'm only using ES for log data, so don't quite have a handle on all the more fancy things that can be done with it :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.