Can't shrink index using Curator

Hi there,

I'm trying to use Curator to remove old data from the main (ingest) data nodes to a "cold" one.
I believe that the Curator Shrink action should be able help me with this.

The error I'm seeing is:

2017-11-08 12:30:21,378 INFO Source index: logstash-paloalto-2017.10.07 -- Target index: logstash-paloalto-2017.10.07-archived
2017-11-08 12:30:21,811 INFO Moving shards to shrink node: "mizar-es-5"
2017-11-08 12:30:21,891 INFO Health Check for all provided keys passed.
2017-11-08 12:30:21,973 ERROR Failed to complete action: shrink. <class 'curator.exceptions.FailedExecution'>: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: Unable to shrink index "logstash-paloalto-2017.10.07" as not all shards were found on the designated shrink node (mizar-es-5): [{'primary': True, 'shard': '2'}]

Curator version is 5.2.0
Elasticsearch Python module version is 5.4.0

I can confirm that a previous allocation towards the "cold" node was done and waited for completion so:

  • What could be happening?
  • Could the shards be in other nodes yet? How can I check where are the shards for a given index?


I can answer one of my own questions upon some quick research.

"Could the shards be in other nodes yet? How can I check where are the shards for a given index?"

It seems that despite having a single "settings.index.routing.allocation" setting, i.e "settings.index.routing.allocation.require._name: mizar-es-5", shards are not being allocated in that node.

So now I know the cause of the Curator error but I don't know why shards won't move from the "hot" nodes.

What do the Elasticsearch logs say? From both the source and target nodes, as well as the elected master node?

Hi Aaron,

Sorry for the delay, I took some days off.

I don't see anything in the data nodes' log about the Shrink action. Maybe I should be looking for DEBUG-level events instead of INFO-level events?

I DO see DanglingIndicesState messages "can not be imported as a dangling index, as index with same name already exists in cluster metadata" due to a past accidental cluster split. Does it relate to the issue at hand?

Thank you.

The inability to allocate dangling indices most definitely could prevent the shrink action from being able to detect when things have stopped shuffling around. You will need to clean that up.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.