Issues when running Curator

waldo · July 28, 2016, 7:46pm

Hello All,

We run curator where I work and when running this, there are two issues:

When running, the command I give it errors out. This means that I have to run it several times in order to delete all of the indices that meet the criteria I give it.
When running the command, the cluster health goes to red rather than staying green. Once the command is complete, then the status goes back to green.

This is the command I am using, although I am also running this in cron.

/usr/bin/curator --host elasticsearch-host --http_auth curator:(key goes here) delete indices --older-than 7 --time-unit days --timestring '%Y.%m.%d' --prefix 'logstash-'

Any ideas as to why this would be happening with the command provided?

Thank you

theuntergeek · July 28, 2016, 8:01pm

This will be very hard to troubleshoot without some indication of what errors you're seeing. Can you put the logs into a gist and link them here? Debug logs would be more helpful, but normal logs will likely reveal much.

Also, can you tell me which version of Elasticsearch you're operating against?

Without seeing those, I'm guessing that you have a rather large number of shards per node, and that the cluster state is getting beat up trying to delete them all in real time. There will likely be all kinds of timeouts visible in the logs.

waldo · July 28, 2016, 8:53pm

Hello,

Thank you for your reply. We currently are running Elasticsearch 2.2.3.

Here is the logfile you are requesting. Please let me know if there is anything else you need:
https://gist.github.com/tdotgreenshirt/9e0e00da57248e432bc6148ddf9abd2a#file-curator-log

Thank you

theuntergeek · July 28, 2016, 9:17pm

It's exactly what I thought it was. You're trying to delete an enormous number of indices in big chunks, and the server is timing out. See line #130 of your gist:

2016-01-25 19:08:31,926 ERROR     Got a TIMEOUT response from Elasticsearch.  Error message: HTTPConnectionPool(host=u'xxx1-elasticsearch-prod-vip', port=9200): Read timed out. (read timeout=30)

For cluster stability, I recommend deleting such a large number of indices in smaller batches. Since you have indices from last year, I recommend trying to delete the oldest month's worth first, then slowly iterate forward a month at a time.

If you want to try the painful, all-at-once approach, which may cause the cluster to become overburdened and very slow to respond, you still can. You'll need to increase the --timeout value at the command-line. The default is 30 seconds. You can go as high as 300 seconds for an all-at once shot, but even that may not be enough. You'll be fighting the "master" timeout as well at the 300 second range, not just the client timeout. The master timeout is how long the master node is permitted to take before responding. This is tunable in Curator up to 300 seconds, because it's set to match the --timeout (client timeout) value up to 300 seconds. The master timeout will not be increased beyond that point. It's not a good thing to have a master node be so unresponsive.

waldo · July 28, 2016, 10:42pm

Thanks for the help!

Topic		Replies	Views
Timeouts while deleting Elasticsearch	27	9265	July 6, 2017
Crontab and Elasticsearch-curator version 5.0.4 Elasticsearch	2	701	June 21, 2017
How to know if my curator instance is running fine? Elasticsearch	2	589	July 6, 2017
Elasticsearch curator 4.x Elasticsearch	4	750	July 5, 2017
Curator is not working Elasticsearch curator	6	738	August 8, 2022

Issues when running Curator

Related topics