Curator Snapshot Error with Closed indices


(Kate) #1

Snapshot is having an error with close indices.

I have this curator script that runs multiple actions. I am not sure if I am doing it correctly or the order of the action affects how it ran.

  1. allocation
  2. forcemerge
  3. close
  4. snapshot
  5. delete

What happens is when the cron for curator runs, it will perform action 1,2 and 3, then when starting snapshot, I am having this error.
ERROR Failed to complete action: snapshot. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(403, u'cluster_block_exception', u'blocked by: [FORBIDDEN/4/index closed];')

To go around this error, i tried to re-open the indices, then ran snapshot afterwards and it went fine.
Now when my script re-run again, it will fail, because it will close the indices again.
Is there a way for snapshot action to ignore close indices? Or is there a better solution?

actions:
1:
action: allocation
description: "Apply shard allocation filtering rules to the specified indices. Move files older than 2 days"
options:
key: node_type
value: warm
allocation_type: require
wait_for_completion: true
timeout_override:
ignore_empty_list: true
continue_if_exception: false
disable_action: false
filters:

  • filtertype: pattern
    kind: prefix
    value: logstash-
  • filtertype: age
    source: name
    direction: older
    timestring: '%Y.%m.%d'
    unit: days
    unit_count: 2

2:
action: forcemerge
description: >-
forceMerge logstash- prefixed indices older than 1 day (based on index
creation_date) to 1 segments per shard. Delay 120 seconds between each
forceMerge operation to allow the cluster to quiesce.
This action will ignore indices already forceMerged to the same or fewer
number of segments per shard, so the 'forcemerged' filter is unneeded.
options:
max_num_segments: 1
delay: 120
timeout_override:
ignore_empty_list: true
continue_if_exception: false
disable_action: False
filters:

  • filtertype: pattern
    kind: prefix
    value: logstash-
    exclude:
  • filtertype: age
    source: creation_date
    direction: older
    unit: days
    unit_count: 1
    exclude:

3:
action: close
description: >-
Close indices older than 90 days (based on index name), for logstash- and syslog- prefixed indices.
options:
delete_aliases: False
timeout_override:
ignore_empty_list: true
continue_if_exception: false
disable_action: False
filters:

  • filtertype: pattern
    kind: prefix
    value: logstash-
    exclude:
  • filtertype: age
    source: name
    direction: older
    timestring: '%Y.%m.%d'
    unit: days
    unit_count: 90
    exclude:

4:
action: snapshot
description: "Snapshot logstash- older than 91 day based on index creation_date.Wait for snapshot to complete."
options:
repository: es_backups
name: logstash-%Y%m%d%H%M%S
ignore_unavailable: false
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
disable_action: False
filters:

  • filtertype: pattern
    kind: prefix
    value: logstash-
  • filtertype: age
    source: name
    timestring: '%Y.%m.%d'
    direction: older
    unit: days
    unit_count: 91

5:
action: delete_indices
description: >-
Delete indices older than 91 days (based on index name), logstash- and syslog-
prefixed indices. Ignore the error if the filter does not result in an
actionable list of indices (ignore_empty_list) and exit cleanly.
options:
ignore_empty_list: True
timeout_override:
continue_if_exception: False
disable_action: False
filters:

  • filtertype: pattern
    kind: prefix
    value: logstash-
    exclude:
  • filtertype: age
    source: name
    direction: older
    timestring: '%Y.%m.%d'
    unit: days
    unit_count: 91
    exclude:

Specifications
Version: ES Version 6.0.0
Platform: Ubuntu
Curator 5.5


(Aaron Mildenstein) #2

Elasticsearch cannot snapshot a closed index, which the error message you received makes plain.

You have two options. Which you choose will depend on the desired outcome.

  1. Re-order your actions so snapshot comes before close. Actions are performed in the order in which they appear.
  2. Add a closed filter to your snapshot filter list.

Option 1 guarantees that the indices you just forcemerged get snapshotted. This is what I presume you wanted to have happen. Option 2 will simply tell the snapshot action to ignore closed indices. If you want the closed indices to be snapshotted, this is perhaps not the option you want to go with. You should simply re-order the actions to close the indices after they've been snapshotted.


(Kate) #3

I tried the option 1 and it works the first run. but the next schedule to run the cron again, it will fail because there are closed indices now. Will the delete action deletes the closed indices too?


(Aaron Mildenstein) #4

The delete_indices action will only delete indices you have identified by your filters.

Snapshotting at older than 91 is just to save before deletion? Since you are snapshotting, then deleting indices older than 91 days, and you are closing older than 90 days, it seems to me that the close action is unnecessary. A closed index for only 1 day is not going to change much.

These numbers are rather large. Can your cluster handle 90 days of open indices, in terms of shard count per node?


(Kate) #5

Yes the snapshot is to save indices before deletion (somewhat like backup). And yes my cluster can handle 90 days of open indices.

So in your perspective, there is no point in closing the indices.

Thank you, I am learning something...


(Aaron Mildenstein) #6

Closing the indices is something to do if you couldn't handle that many open indices, even though your storage can handle it. Each open shard has a resource cost (heap space), so the more you have, the more memory constrained your nodes become for indexing operations. In a cluster that could hold 90 days worth of indices, but could not keep more than 30 days worth of indices in an open state, you'd snapshot at 30 days, then close the indices to keep them present, but not used.


(Kate) #7

Thank you for the clarification master @theuntergeek :smiley:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.