Curator throwing error while restoring (Failed to complete action- )

I have recently made some upgrades in my dev box-
ES 5.0 -> ES 5.4
elasticsearch-curator 4.1.0 -> elasticssearch-curator 5.1.1

Curator upgrade has automatically upgraded python elasticsearch package from 5.0.1 to 5.4.0.

Now when I am trying to do restore from backup, it is failing. I am providing a fraction of error log

2017-06-14 06:44:02,789 DEBUG curator.utils test_repo_fs:1134 Nodes with verified repository access: {u'xzcHEUF7QiaJ1cJgu84Iuw': {u'name': u'xzcHEUF'}}
2017-06-14 06:44:02,790 INFO curator.actions.snapshot do_action:1724 Restoring indices "['shopper_actions_2017_06_']" from snapshot: curator-20170613215206
2017-06-14 06:44:03,679 DEBUG curator.utils wait_for_it:1581 Elapsed time: 0 seconds
2017-06-14 06:44:03,682 INFO curator.utils restore_check:1445 recovery returned an empty response. Trying again.
2017-06-14 06:44:03,682 DEBUG curator.utils wait_for_it:1584 Response: False
2017-06-14 06:44:03,682 DEBUG curator.utils wait_for_it:1604 Action "restore" not yet complete, 9 total seconds elapsed. Waiting 9 seconds before checking again.
2017-06-14 06:44:12,691 DEBUG curator.utils wait_for_it:1581 Elapsed time: 9 seconds
2017-06-14 06:44:12,699 ERROR curator.cli cli:193 Failed to complete action: restore. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: 'shopper_actions_2017_06
'

When I try to run restore for proper index names, it runs properly (eg. shopper_actions_2017_06_01,shopper_actions_2017_06_02), but when I give it a pattern (eg shopper_actions_2017_06_*) it starts failing. This is related to this issue and this issue is already closed.

I'm afraid the problem you're experiencing doesn't relate to that issue, even if part of the message is similar. How do I know? That issue is resolved, and you can tell because Curator retried after that message:

Trying again.

While the multiple index syntax is apparently supported by Elasticsearch, the wildcard expansion is somehow not happening, or something else is getting in the way.

What does your action file configuration look like? Specifically, the indices line? (exactly as it is, quotes and all, or no quotes, or escapes, or whatever you may have there)

What do the logs from Elasticsearch look like during that time? An error like that most likely created a stack trace in the Elasticsearch logs, either on the client node you contacted, or the elected master node (if you don't know which that is, check all of the master eligible nodes), or both.

Looking through the code, I think I know what's happening here, and if I'm right, it's a bug. I would still appreciate your logs to help verify that.

I just added a test to the Curator test suite to restore indices using a wildcard, and it worked, so that rules out my first guess.

The test creates 3 indices, each prefixed with my_index. The indices line looks like this:

      indices: ['my_*']

And with that, the test works as expected.

So, please tell me what your indices line looks like.

Thanks for looking into the issue. My indices contain time series data. Names are like shopper_action_2017_06_01, shopper_actions_2017_06_02 and so on. As the restore was failing so I reduced the restore to one index pattern only ie. shopper_actions_*

Posting options from action file-

options:
repository: test-repo
# Leaving name blank will result in restoring the most recent snapshot by age
name:
# Leaving indices blank will result in restoring all indices in the snapshot
indices: ["shopper_actions_2017_06_*"]
include_aliases: True
ignore_unavailable: False
include_global_state: True
partial: False
rename_pattern:
rename_replacement:
extra_settings:
wait_for_completion: True
skip_repo_fs_check: False
timeout_override: 12000
continue_if_exception: False
disable_action: False

The problem isn't likely to be the action file, then. Please provide the requested logs.

Please paste the important lines between triple back-ticks, like this:

``` 
LOGS GO HERE
```

Seems I can post only 7000 lines. Posting last lines. Let me know if you want full log. I will upload it somewhere.

2017-06-14 08:31:03,362 DEBUG     curator.actions.snapshot               __init__:1637 WAIT_FOR_COMPLETION: True
2017-06-14 08:31:03,362 DEBUG     curator.actions.snapshot               __init__:1639 SKIP_REPO_FS_CHECK: False
2017-06-14 08:31:03,362 DEBUG     curator.actions.snapshot               __init__:1640 BODY: {'ignore_unavailable': False, 'partial': False, 'include_aliases': True, 'rename_replacement': '', 'rename_pattern': '', 'indices': ['shopper_actions_2017_06_*'], 'include_global_state': True}
2017-06-14 08:31:03,362 DEBUG                curator.cli         process_action:98   Doing the action here.
2017-06-14 08:31:03,574 DEBUG              curator.utils           test_repo_fs:1132 All nodes can write to the repository
2017-06-14 08:31:03,574 DEBUG              curator.utils           test_repo_fs:1134 Nodes with verified repository access: {u'xzcHEUF7QiaJ1cJgu84Iuw': {u'name': u'xzcHEUF'}}
2017-06-14 08:31:03,576 INFO      curator.actions.snapshot              do_action:1724 Restoring indices "['shopper_actions_2017_06_*']" from snapshot: curator-20170613215206
2017-06-14 08:31:04,496 DEBUG              curator.utils            wait_for_it:1581 Elapsed time: 0 seconds
2017-06-14 08:31:04,499 INFO               curator.utils          restore_check:1445 _recovery returned an empty response. Trying again.
2017-06-14 08:31:04,499 DEBUG              curator.utils            wait_for_it:1584 Response: False
2017-06-14 08:31:04,499 DEBUG              curator.utils            wait_for_it:1604 Action "restore" not yet complete, 9 total seconds elapsed. Waiting 9 seconds before checking again.
2017-06-14 08:31:13,510 DEBUG              curator.utils            wait_for_it:1581 Elapsed time: 9 seconds
2017-06-14 08:31:13,517 ERROR                curator.cli                    cli:193  Failed to complete action: restore.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: 'shopper_actions_2017_06_*'

ES logs-

[2017-06-14T00:03:17,687][INFO ][o.e.c.m.MetaDataCreateIndexService] [xzcHEUF] [metricbeat-2017.06.14] creating index, cause [auto(bulk api)], templates [default_template_1, metricbeat], shards [1]/[1], mappings [_default_]
[2017-06-14T00:03:17,900][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [metricbeat-2017.06.14/fywRpeimRZGq8sTCFzxbZg] create_mapping [metricsets]
[2017-06-14T00:04:24,302][INFO ][o.e.c.m.MetaDataCreateIndexService] [xzcHEUF] [.monitoring-es-2-2017.06.14] creating index, cause [auto(bulk api)], templates [default_template_1, .monitoring-es-2], shards [1]/[1], mappings [_default_, node, shards, index_stats, index_recovery, cluster_state, cluster_stats, indices_stats, node_stats]
[2017-06-14T00:04:24,461][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [shards]
[2017-06-14T00:04:24,724][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [indices_stats]
[2017-06-14T00:04:24,762][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [cluster_state]
[2017-06-14T00:04:24,799][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [node]
[2017-06-14T00:04:24,863][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [node_stats]
[2017-06-14T00:04:24,910][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [index_stats]
[2017-06-14T00:04:25,262][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [index_recovery]
[2017-06-14T00:04:25,312][INFO ][o.e.c.m.MetaDataMappingService] [xzcHEUF] [.monitoring-es-2-2017.06.14/iK1ybPvWQye-USzhUWwJXA] update_mapping [cluster_stats]
[2017-06-14T00:30:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] triggering scheduled [ML] maintenance tasks
...skipping...
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_09/OT7QtxLrQAycJfu5ns0knQ] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_05/QdVuS6rwQEWBEqUOI-qHwg] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_02/nmWS7qZNQam9zc359TG7FA] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_01/6g9x8QEwTSe3hqFXY9gx0w] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_08/ZRgNwECYSEeydqQ1GXddQQ] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_07/9h3ltqZVRzWSTdyQjNfjTg] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_12/ytncQjLrS2-lQ9-RG7aDnQ] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_13/hdtNWLL4T7-QfFvziE9VuQ] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_03/lNVNDGLlQvep5KnnnwrrUA] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_06/XqQtqiEqTcCgGjuW7XcJBQ] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_10/d6JPekEuRXytHm4cuDxu0Q] deleting index
[2017-06-14T08:46:06,285][INFO ][o.e.c.m.MetaDataDeleteIndexService] [xzcHEUF] [shopper_actions_2017_06_11/hQBC7uv_RYKs-POsp_6ycQ] deleting index

I'm seeing the ES logs at 00:30 and then skipping to 08:46, but the Curator events are at 08:31. Can you post correlating times, please?

Also, the Curator log level should be increased. In the client configuration file, change these entries in the logging: section:

  loglevel: DEBUG
  blacklist: []

This will show other traffic information.

Logs after removing es from blacklist

2017-06-14 08:56:59,431 INFO               curator.utils          restore_check:1445 _recovery returned an empty response. Trying again.
2017-06-14 08:56:59,431 DEBUG              curator.utils            wait_for_it:1584 Response: False
2017-06-14 08:56:59,431 DEBUG              curator.utils            wait_for_it:1604 Action "restore" not yet complete, 9 total seconds elapsed. Waiting 9 seconds before checking again.
2017-06-14 08:57:08,440 DEBUG              curator.utils            wait_for_it:1581 Elapsed time: 9 seconds
2017-06-14 08:57:08,441 DEBUG         urllib3.util.retry               from_int:200  Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
2017-06-14 08:57:08,447 DEBUG     urllib3.connectionpool          _make_request:395  http://127.0.0.1:9200 "GET /shopper_actions_2017_06_*/_recovery?human=true HTTP/1.1" 200 4407
2017-06-14 08:57:08,447 INFO               elasticsearch    log_request_success:83   GET http://127.0.0.1:9200/shopper_actions_2017_06_*/_recovery?human=true [status:200 request:0.007s]
2017-06-14 08:57:08,447 DEBUG              elasticsearch    log_request_success:85   > None
2017-06-14 08:57:08,448 DEBUG              elasticsearch    log_request_success:86   < {"shopper_actions_2017_06_13":{"shards":[{"id":1,"type":"SNAPSHOT","stage":"INDEX","primary":true,"start_time":"2017-06-14T08:56:59.443Z","start_time_in_millis":1497430619443,"total_time":"9s","total_time_in_millis":9000,"source":{"repository":"sl_es_s3_repo","snapshot":"curator-20170613215206","version":"5.0.2","index":"shopper_actions_2017_06_13"},"target":{"id":"xzcHEUF7QiaJ1cJgu84Iuw","host":"in-dev01","transport_address":"127.0.0.1:9300","ip":"127.0.0.1","name":"xzcHEUF"},"index":{"size":{"total":"389.8mb","total_in_bytes":408825173,"reused":"0b","reused_in_bytes":0,"recovered":"55.5mb","recovered_in_bytes":58201183,"percent":"14.2%"},"files":{"total":60,"reused":0,"recovered":1,"percent":"1.7%"},"total_time":"8.9s","total_time_in_millis":8991,"source_throttle_time":"-1","source_throttle_time_in_millis":0,"target_throttle_time":"-1","target_throttle_time_in_millis":0},"translog":{"recovered":0,"total":0,"percent":"100.0%","total_on_start":0,"total_time":"0s","total_time_in_millis":0},"verify_index":{"check_index_time":"0s","check_index_time_in_millis":0,"total_time":"0s","total_time_in_millis":0}},{"id":0,"type":"SNAPSHOT","stage":"INDEX","primary":true,"start_time":"2017-06-14T08:56:59.452Z","start_time_in_millis":1497430619452,"total_time":"8.9s","total_time_in_millis":8992,"source":{"repository":"test-repo","snapshot":"curator-20170613215206","version":"5.0.2","index":"shopper_actions_2017_06_13"},"target":{"id":"xzcHEUF7QiaJ1cJgu84Iuw","host":"in-dev01","transport_address":"127.0.0.1:9300","ip":"127.0.0.1","name":"xzcHEUF"},"index":{"size":{"total":"387.8mb","total_in_bytes":406697088,"reused":"0b","reused_in_bytes":0,"recovered":"44.9mb","recovered_in_bytes":47173426,"percent":"11.6%"},"files":{"total":72,"reused":0,"recovered":46,"percent":"63.9%"},"total_time":"8.9s","total_time_in_millis":8987,"source_throttle_time":"-1","source_throttle_time_in_millis":0,"target_throttle_time":"-1","target_throttle_time_in_millis":0},"translog":{"recovered":0,"total":0,"percent":"100.0%","total_on_start":0,"total_time":"0s","total_time_in_millis":0},"verify_index":{"check_index_time":"0s","check_index_time_in_millis":0,"total_time":"0s","total_time_in_millis":0}}]},"shopper_actions_2017_06_12":{"shards":[{"id":1,"type":"SNAPSHOT","stage":"INDEX","primary":true,"start_time":"2017-06-14T08:56:59.457Z","start_time_in_millis":1497430619457,"total_time":"8.9s","total_time_in_millis":8987,"source":{"repository":"test-repo","snapshot":"curator-20170613215206","version":"5.0.2","index":"shopper_actions_2017_06_12"},"target":{"id":"xzcHEUF7QiaJ1cJgu84Iuw","host":"in-dev01","transport_address":"127.0.0.1:9300","ip":"127.0.0.1","name":"xzcHEUF"},"index":{"size":{"total":"645.4mb","total_in_bytes":676801972,"reused":"0b","reused_in_bytes":0,"recovered":"56.9mb","recovered_in_bytes":59734671,"percent":"8.8%"},"files":{"total":94,"reused":0,"recovered":2,"percent":"2.1%"},"total_time":"8.9s","total_time_in_millis":8981,"source_throttle_time":"-1","source_throttle_time_in_millis":0,"target_throttle_time":"-1","target_throttle_time_in_millis":0},"translog":{"recovered":0,"total":0,"percent":"100.0%","total_on_start":0,"total_time":"0s","total_time_in_millis":0},"verify_index":{"check_index_time":"0s","check_index_time_in_millis":0,"total_time":"0s","total_time_in_millis":0}},{"id":0,"type":"SNAPSHOT","stage":"INDEX","primary":true,"start_time":"2017-06-14T08:56:59.464Z","start_time_in_millis":1497430619464,"total_time":"8.9s","total_time_in_millis":8981,"source":{"repository":"sl_es_s3_repo","snapshot":"curator-20170613215206","version":"5.0.2","index":"shopper_actions_2017_06_12"},"target":{"id":"xzcHEUF7QiaJ1cJgu84Iuw","host":"in-dev01","transport_address":"127.0.0.1:9300","ip":"127.0.0.1","name":"xzcHEUF"},"index":{"size":{"total":"646.6mb","total_in_bytes":678057972,"reused":"0b","reused_in_bytes":0,"recovered":"54.6mb","recovered_in_bytes":57299790,"percent":"8.5%"},"files":{"total":90,"reused":0,"recovered":6,"percent":"6.7%"},"total_time":"8.9s","total_time_in_millis":8965,"source_throttle_time":"-1","source_throttle_time_in_millis":0,"target_throttle_time":"-1","target_throttle_time_in_millis":0},"translog":{"recovered":0,"total":0,"percent":"100.0%","total_on_start":0,"total_time":"0s","total_time_in_millis":0},"verify_index":{"check_index_time":"0s","check_index_time_in_millis":0,"total_time":"0s","total_time_in_millis":0}}]}}
2017-06-14 08:57:08,450 ERROR                curator.cli                    cli:193  Failed to complete action: restore.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: 'shopper_actions_2017_06_*'

Okay, I think I know what's going on now. The recovery check isn't expanding the wildcard, so it thinks there's something wrong, even though the recovery is working. I'll file a bug report for this.

The recovery response is correctly generated by the wildcard list of indices. The problem is, that the recovery check is looking for an index named for the wildcard, and that doesn't exist.

The solution will be to extract the indices from the recovery response, and check for each of them to be complete.

See https://github.com/elastic/curator/pull/990 for the fix in this pull request, plus the new test.

Thanks a lot for your time. :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.