Curator delete indices action stops intermittently

Hi,

I am able to delete indices using curator but i am getting below error intermittently and then the process stops.

2019-05-21 16:08:22,022 ERROR     curator.actions.delete_indices         _verify_result:489  The following indices failed to delete on try #1:
2019-05-21 16:08:22,033 ERROR     curator.actions.delete_indices         _verify_result:492  ---winlogbeat-6.5.4-2019.03.03
2019-05-21 16:08:22,033 ERROR     curator.actions.delete_indices         _verify_result:492  ---winlogbeat-6.5.4-2016.10.31
2019-05-21 16:08:22,033 ERROR     curator.actions.delete_indices         _verify_result:492  ---winlogbeat-6.5.4-2017.02.28
2019-05-21 16:08:22,033 ERROR     curator.actions.delete_indices         _verify_result:492  ---winlogbeat-6.5.4-2018.12.28
2019-05-21 16:08:22,033 ERROR     curator.actions.delete_indices         _verify_result:492  ---winlogbeat-6.5.4-2018.12.29
2019-05-21 16:08:22,044 INFO      curator.actions.delete_indices           __chunk_loop:509  ---deleting index winlogbeat-6.5.4-2017.02.28
2019-05-21 16:08:22,044 INFO      curator.actions.delete_indices           __chunk_loop:509  ---deleting index winlogbeat-6.5.4-2018.12.28
2019-05-21 16:08:22,044 INFO      curator.actions.delete_indices           __chunk_loop:509  ---deleting index winlogbeat-6.5.4-2018.12.29
2019-05-21 16:08:22,044 INFO      curator.actions.delete_indices           __chunk_loop:509  ---deleting index winlogbeat-6.5.4-2017.02.28
2019-05-21 16:08:22,044 INFO      curator.actions.delete_indices           __chunk_loop:509  ---deleting index winlogbeat-6.5.4-2018.12.28
2019-05-21 16:08:22,044 INFO      curator.actions.delete_indices           __chunk_loop:509  ---deleting index winlogbeat-6.5.4-2018.12.29
2019-05-21 16:12:28,063 ERROR                curator.cli                    run:191  Failed to complete action: delete_indices.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: NotFoundError(404, u'index_not_found_exception', u'no such index')

Please check the delete indcies action file.

---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True.  If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 45 days (based on index name), for winlogbeat-
      prefixed indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      timeout_override: 300
      ignore_empty_list: True
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: winlogbeat-
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 21

Curator configuration file.

---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts:
    - 10.0.45.65
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: DEBUG
  logfile: /home/appadmin/curator.log
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

Is there some configuration I am missing.

Regards,
Nikhil

This error typically happens when your cluster is overburdened. A large batch of deletes fails to process in time, so the underlying API calls are repeated automatically. But suddenly, the index deletion completes, and the retry fails because the index is now gone.

When this happens, it’s usually because your cluster is undersized — too many shards across too few data nodes. Or you do not have dedicated master nodes so a busy data/master node can’t keep up with the cluster state changes. Or many other variations on the underpowered cluster theme.

Hi,

As you said we had around 4k shards distributed within our 3 node cluster. For the same reason I was going with the indices deletion.

Was able to complete my task after couple of retries. Thanks anyways :blush:

And how much RAM is set aside for the JVM heap for Elasticsearch in each of these 3 nodes?

As a guideline, you should not have more than 20 open, non-frozen shards per gigabyte of heap. For example, with 30G heaps, that would be no more than 600 shards per node, a total of 1800. If your nodes do not have 30G heaps, then that number falls off sharply.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.