This is logged at the ERROR level and I'm not sure why. The ILM policy does not have a shrink action defined.
Sample log, with the index name changed only to swap the account ID with 12345.
current step [{"phase":"warm","action":"shrink","name":"shrink"}] for index [alerts-account12345-000013] with policy [alerts-active-90] is not recognized
Output from GET _ilm/policy/alerts-active-90 with the long list of indices removed.
GET /alerts-account12345-000013/_ilm/explain is always a good first step in debugging. If you add the output it might give us another hint.
If there is any chance the policy changed (or just to be sure), you could also try a POST /alerts-account12345-000013/_ilm/retry and do an explain afterwards again to see if that changed anything
The interesting thing here is that the alerts-account1234 policy doesn't exist anymore and hasn't for a month or so. The indices were all moved from the account-specific ILM policy to a shared one based on behaviour, as elasticsearch seemed unhappy about having a thousand or so ILM policies - loading the ILM page timed out after 30s, same for listing through the api.
So I have indices that have the correct policy listed on them, but the phase_execution still references the old policy that doesn't exist, and it will never recover by itself.
This moved it to cold and phase_execution references the correct policy now. It's odd that the shrink_index_name is still set, but since the index is beyond the alerts-active-90 retention period it was removed shortly after. There were only 7 indices affected by this, all beyond the retention period, so I deleted the others.
On a somewhat related note, I had a shrink step in warm to shrink down to 1 shard, but found that Elasticsearch reported an error if the source index had 1 shard. Apparently shrinking from 1 to 1 wasn't allowed. Haven't tested that one for a few versions though, so it's possible it was fixed.
On Apr 27th the age was 197d, so ~2022-10-12 for the index
Relevant upgrade dates
2021-10-07 es 7.15.0
2021-12-18 es 7.16.1
2022-10-12 index age suggests this date
2023-04-29 es 7.17.9
2023-05-03 es 7.17.10
2023-05-09 I used the move to lifecycle steps api to fix the issue
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.