Elasticsearch 7.17.9 - current step is not recognized for shrink action, despite not having a shrink action defined

This is logged at the ERROR level and I'm not sure why. The ILM policy does not have a shrink action defined.

Sample log, with the index name changed only to swap the account ID with 12345.

current step [{"phase":"warm","action":"shrink","name":"shrink"}] for index [alerts-account12345-000013] with policy [alerts-active-90] is not recognized

Output from GET _ilm/policy/alerts-active-90 with the long list of indices removed.

{
  "alerts-active-90" : {
    "version" : 32,
    "modified_date" : "2023-04-27T10:21:22.062Z",
    "policy" : {
      "phases" : {
        "warm" : {
          "min_age" : "0d",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 2,
              "include" : { },
              "exclude" : { },
              "require" : {
                "data" : "warm"
              }
            },
            "forcemerge" : {
              "max_num_segments" : 1
            },
            "readonly" : { },
            "set_priority" : {
              "priority" : 25
            }
          }
        },
        "cold" : {
          "min_age" : "6d",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 1,
              "include" : { },
              "exclude" : { },
              "require" : {
                "data" : "cold"
              }
            },
            "set_priority" : {
              "priority" : 10
            }
          }
        },
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_size" : "50gb",
              "max_primary_shard_size" : "40gb",
              "max_age" : "7d"
            },
            "set_priority" : {
              "priority" : 50
            }
          }
        },
        "delete" : {
          "min_age" : "88d",
          "actions" : {
            "delete" : {
              "delete_searchable_snapshot" : true
            }
          }
        }
      }
    },
    "in_use_by" : {
      "indices" : [
        "alerts-account12345-000013"
      ]
    }
  }
}
  1. GET /alerts-account12345-000013/_ilm/explain is always a good first step in debugging. If you add the output it might give us another hint.
  2. If there is any chance the policy changed (or just to be sure), you could also try a POST /alerts-account12345-000013/_ilm/retry and do an explain afterwards again to see if that changed anything
1 Like

Ahh thank you so much, that showed the problem.
Failed phase execution from a previous policy, with shrink_index_name carried over.

{
  "indices" : {
    "alerts-account12345-000037" : {
      "index" : "alerts-account1234-000037",
      "managed" : true,
      "policy" : "alerts-active-90",
      "lifecycle_date_millis" : 1666549567992,
      "age" : "197.22d",
      "phase" : "warm",
      "phase_time_millis" : 1666554947258,
      "action" : "shrink",
      "action_time_millis" : 1666551613226,
      "step" : "shrink",
      "step_time_millis" : 1666554947258,
      "is_auto_retryable_error" : true,
      "failed_step_retry_count" : 2,
      "shrink_index_name" : "shrink-dmkx-alerts-account1234-000037",
      "phase_execution" : {
        "policy" : "alerts-account1234",
        "phase_definition" : {
          "min_age" : "0d",
          "actions" : {
            "forcemerge" : {
              "max_num_segments" : 1
            },
            "allocate" : {
              "number_of_replicas" : 2,
              "include" : { },
              "exclude" : { },
              "require" : {
                "data" : "warm"
              }
            },
            "readonly" : { },
            "set_priority" : {
              "priority" : 25
            }
          }
        },
        "version" : 119,
        "modified_date_in_millis" : 1666554898802
      }
    }
  }
}

What would you recommend I do to fix this?

I haven't tested this, but I thought updating the policy (to not have the the shrink) + retry should do the trick?

The interesting thing here is that the alerts-account1234 policy doesn't exist anymore and hasn't for a month or so. The indices were all moved from the account-specific ILM policy to a shared one based on behaviour, as elasticsearch seemed unhappy about having a thousand or so ILM policies - loading the ILM page timed out after 30s, same for listing through the api.

So I have indices that have the correct policy listed on them, but the phase_execution still references the old policy that doesn't exist, and it will never recover by itself.

Move to Lifecycle Step API looked promising, so I experimented with

POST _ilm/move/alerts-account12345-000037
{
  "current_step": {
    "phase": "warm",
    "action": "shrink",
    "name": "shrink"
  },
  "next_step": {
    "phase": "cold"
  }
}

This moved it to cold and phase_execution references the correct policy now. It's odd that the shrink_index_name is still set, but since the index is beyond the alerts-active-90 retention period it was removed shortly after. There were only 7 indices affected by this, all beyond the retention period, so I deleted the others.

On a somewhat related note, I had a shrink step in warm to shrink down to 1 shard, but found that Elasticsearch reported an error if the source index had 1 shard. Apparently shrinking from 1 to 1 wasn't allowed. Haven't tested that one for a few versions though, so it's possible it was fixed.

  1. Great that this solved it.
  2. I had hoped that https://github.com/elastic/elasticsearch/pull/74219 (which was added in 7.15) would have caught this? This couldn't be a leftover from an older version?

On Apr 27th the age was 197d, so ~2022-10-12 for the index

Relevant upgrade dates
2021-10-07 es 7.15.0
2021-12-18 es 7.16.1
2022-10-12 index age suggests this date
2023-04-29 es 7.17.9
2023-05-03 es 7.17.10
2023-05-09 I used the move to lifecycle steps api to fix the issue

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.