I'm trying to use Curator to shrink indices which are older than a certain number of days from 4 shards to 2 shards. However after it finishes moving all the shards to the same node, it simply stops working. In Kibana's monitoring I can see that the shards in fact have been correctly moved to one node. If I run the _shrink api by hand, I am able to finish the process just fine. However Curator is unable to do the same.
On INFO log level, the log says practically nothing, however this is what I see in DEBUG mode:
2018-07-31 14:39:57,436 DEBUG curator.actions.shrink pre_shrink_check:2071 FINISH PRE_SHRINK_CHECK
2018-07-31 14:39:57,436 INFO curator.actions.shrink do_action:2122 Moving shards to shrink node: "node4"
2018-07-31 14:39:57,902 DEBUG curator.utils wait_for_it:1751 Elapsed time: 0 seconds
2018-07-31 14:39:57,902 DEBUG curator.utils health_check:1498 KWARGS= "{'relocating_shards': 0}"
2018-07-31 14:39:57,912 DEBUG curator.utils health_check:1512 NO MATCH: Value for key "0", health check data: 3
2018-07-31 14:39:57,912 DEBUG curator.utils wait_for_it:1754 Response: False
2018-07-31 14:39:57,912 DEBUG curator.utils wait_for_it:1774 Action "allocation" not yet complete, 0 total seconds elapsed. Waiting 9 seconds before checking again.
2018-07-31 14:40:06,921 DEBUG curator.utils wait_for_it:1751 Elapsed time: 9 seconds
2018-07-31 14:40:06,922 DEBUG curator.utils health_check:1498 KWARGS= "{'relocating_shards': 0}"
2018-07-31 14:40:06,961 DEBUG curator.utils health_check:1512 NO MATCH: Value for key "0", health check data: 2
2018-07-31 14:40:06,961 DEBUG curator.utils wait_for_it:1754 Response: False
2018-07-31 14:40:06,961 DEBUG curator.utils wait_for_it:1774 Action "allocation" not yet complete, 9 total seconds elapsed. Waiting 9 seconds before checking again.
And this goes on until I kill the process.
Here's my action.yml:
actions:
1:
action: shrink
description: Shrink indices older than 31 days on the node with the most available space. Delete source index after successful shrink, then reroute the shrunk index with the provided parameters.
options:
ignore_empty_list: True
shrink_node: DETERMINISTIC
node_filters:
permit_masters: False
exclude_nodes: ['not_this_node']
number_of_shards: 2
number_of_replicas: 0
shrink_prefix:
shrink_suffix: '-shrinked'
delete_after: True
post_allocation:
allocation_type: include
key: node_tag
value: cold
wait_for_active_shards: 1
extra_settings:
settings:
index.codec: best_compression
wait_for_completion: True
wait_for_rebalance: True
wait_interval: 9
max_wait: -1
filters:
- filtertype: pattern
kind: prefix
value: log-
exclude: False
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 31
exclude: False
I repeat: I can run shrinking just fine "by hand," but after the allocation phase Curator fails to continue, despite the fact that all the evidence shows that it has already been finished long ago.