i am using curator in production . i have total 12 actions in my actions.yml
but curator throws exception at action 9 .So the remaining actions are not executed
following exception are come.
2020-05-19 07:32:20,257 INFO Trying Action ID: 9, "delete_indices": Delete indices older than (based on index name), for logstash- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly.
2020-05-19 07:32:22,191 INFO Deleting 3 selected indices: ['logstash-2020.05.18.14', 'logstash-2020.05.18.13', 'logstash-2020.05.18.12']
2020-05-19 07:32:22,191 INFO ---deleting index logstash-2020.05.18.14
2020-05-19 07:32:22,191 INFO ---deleting index logstash-2020.05.18.13
2020-05-19 07:32:22,191 INFO ---deleting index logstash-2020.05.18.12
2020-05-19 07:32:24,460 ERROR Failed to complete action: delete_indices. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: Failed to get indices. Error: NotFoundError(404, 'index_not_found_exception', 'no such index [logstash-2020.05.18.14]', logstash-2020.05.18.14, index_or_alias)
What this implies is that your cluster is being very slow to update the cluster state during/after the index delete steps, such that even after Curator has received an "okay, the index is deleted" message from the client connection, on a subsequent API call, the index still appears to be present. In such a case, Curator attempts to delete it again. Somehow, in the very few moments between Curator finding that the index still appears to be present and the attempt to re-delete the index, the cluster state finally refreshes and the index is truly gone, so you get a 404, index not found error.
As stated, this is a relatively rare occurrence in a fully performant cluster. This only tends to happen on clusters which are overtaxed and/or overloaded, which can be from one or more of the following (or other scenarios, too):
Too many shards per node
The cluster state is too large from having too many fields in one or more index mappings
The master nodes are both master & data, which can result in long Java garbage collection pauses on a master node, leading to the cluster state update race condition mentioned
You will need to set loglevel: DEBUG and include the more verbose information to demonstrate the retry I mentioned. A considerable amount more debugging becomes necessary to track down why your cluster state is slow to update, resulting in that race condition.
thank you @theuntergeek for your valueable reply .
but for now i can not change any thing except updating the config-maps.
can i use some tricky solutions which prevent the error .
like
idle value for "timeout" in config.yml
or some other solution.
There are no tricks to fixing an overtaxed cluster other than to eliminate the bottlenecks. You either fix it, or it keeps causing problems.
As far as trying to make Curator proceed in the face of these cluster state update delays, you could (though I emphatically do not recommend it) configure Curator to ignore errors and just keep proceeding through actions. That's not a fix at all, though, and it could lead to other errors and unexpected outcomes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.