ILM Retry API not honoring timeouts

I'm try to retry ILM steps for all indexes, using following command, however

POST */_ilm/retry
{
  "timeout": "30m",
  "master_timeout": "30m"
}

Still following error is thrown. Is there another param to override process_cluster_event_timeout_exception?

{
  "error" : {
    "root_cause" : [
      {
        "type" : "process_cluster_event_timeout_exception",
        "reason" : "failed to process cluster event (ilm-re-run) within 30s"
      }
    ],
    "type" : "process_cluster_event_timeout_exception",
    "reason" : "failed to process cluster event (ilm-re-run) within 30s"
  },
  "status" : 503
}

The parameters go in the URL: POST */_ilm/retry?master_timeout=30m&timeout=30m. However if it's taking more than 30s to do this then there's something very wrong with your cluster, your master is far too busy doing other things. You should investigate (e.g. look at cluster pending tasks) and fix that, not just pile more work onto an already-overloaded master.

3 Likes

Ouch :sweat_smile: Thanks a lot.

A quick noob question, should our logstash point to master nodes, or data nodes? We've pointed it to master node with the assumption that master decides on which shard should the data be kept.

If we send it to data node, can we simply point it to dns of pool of data nodes?

You should always prefer to send traffic to non-master nodes.

1 Like

I opened a bug report since I think it would have helped to reject the request rather than silently ignore the bits we didn't expect:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.