Indices managed by Index Lifecycle Policy don't get deleted

I'm using Filebeat to ingest network logs. I have an Index Lifecycle Policy that applies to all Filebeat indices*. I have set the Delete Phase to move data into the phase when it's 30 days old. However, indices don't seem to ever get deleted; indices older than 30 days are still around**, and I have to manually delete them. Am I missing a step?

*I believe this because the 'linked indices' column in the Index Lifecycle Policy page shows the vast majority of indices are governed by the 'filebeat' policy.

** I'm judging the age of an index by its name, for instance .monitoring-es-7-2025.01.14

Hi @artschooldropout Pick one of the indices that you think should be deleted and run and share the following..

GET myindex/_ilm/explain

Also then share the entire ILM policy for that index as well.

monitoring indices may be handled a bit differently (different ILM)

Thanks @stephenb! Here's the output of GET myindex/_ilm/explain:

{
  "indices": {
    ".ds-filebeat-8.15.3-2024.12.09-000056": {
      "index": ".ds-filebeat-8.15.3-2024.12.09-000056",
      "managed": true,
      "policy": "filebeat",
      "index_creation_date_millis": 1733773023359,
      "time_since_index_creation": "38.82d",
      "lifecycle_date_millis": 1733834821599,
      "age": "38.1d",
      "phase": "warm",
      "phase_time_millis": 1735044657439,
      "action": "migrate",
      "action_time_millis": 1735044658039,
      "step": "check-migration",
      "step_time_millis": 1735044658639,
      "step_info": {
        "message": "Waiting for all shard copies to be active",
        "shards_left_to_allocate": -1,
        "all_shards_active": false,
        "number_of_replicas": 1
      },
      "phase_execution": {
        "policy": "filebeat",
        "phase_definition": {
          "min_age": "14d",
          "actions": {
            "set_priority": {
              "priority": 50
            }
          }
        },
        "version": 10,
        "modified_date_in_millis": 1734030610502
      }
    }
  }
}

And here's the ILM:

{
  "filebeat": {
    "version": 10,
    "modified_date": "2024-12-12T19:10:10.502Z",
    "policy": {
      "phases": {
        "warm": {
          "min_age": "14d",
          "actions": {
            "set_priority": {
              "priority": 50
            }
          }
        },
        "cold": {
          "min_age": "14d",
          "actions": {
            "set_priority": {
              "priority": 0
            }
          }
        },
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_age": "30d",
              "max_primary_shard_size": "50gb"
            }
          }
        },
        "delete": {
          "min_age": "30d",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
        ".ds-filebeat-8.15.0-2024.12.01-000129",
        ".ds-filebeat-8.15.3-2024.12.18-000072",
        ".ds-filebeat-8.15.3-2025.01.03-000088",
        ".ds-filebeat-8.15.3-2024.12.31-000086",
        ".ds-filebeat-8.15.3-2025.01.16-000145",
        ".ds-filebeat-8.15.3-2025.01.16-000143",
        ".ds-filebeat-8.9.0-2023.09.10-000002",
        ".ds-filebeat-8.15.3-2025.01.15-000141",
        ".ds-filebeat-8.14.3-2024.09.13-000018",
        ".ds-filebeat-8.15.3-2025.01.01-000087",
        ".ds-filebeat-8.15.3-2024.12.29-000085",
        ".ds-filebeat-8.15.3-2024.12.28-000081",
        ".ds-filebeat-8.13.4-2024.07.12-000023",
        ".ds-filebeat-8.14.1-2024.08.22-000023",
        ".ds-filebeat-8.15.3-2025.01.13-000136",
        ".ds-filebeat-8.15.3-2025.01.14-000138",
        ".ds-filebeat-8.15.3-2024.12.22-000076",
        ".ds-filebeat-8.15.3-2025.01.10-000131",
        ".ds-filebeat-8.15.3-2024.12.25-000078",
        ".ds-filebeat-8.15.3-2025.01.14-000139",
        ".ds-filebeat-8.15.3-2024.12.23-000077",
        ".ds-filebeat-8.15.3-2025.01.08-000117",
        ".ds-filebeat-8.15.3-2025.01.09-000119",
        ".ds-filebeat-8.15.3-2024.12.09-000056",
        ".ds-filebeat-8.15.3-2024.12.26-000079",
        ".ds-filebeat-8.15.3-2025.01.11-000134",
        ".ds-filebeat-8.15.3-2025.01.08-000118",
        ".ds-filebeat-8.15.3-2025.01.12-000135",
        ".ds-filebeat-8.9.1-2023.11.10-000029",
        ".ds-filebeat-8.13.2-2024.05.15-000018",
        ".ds-filebeat-8.15.3-2024.12.18-000073",
        ".ds-filebeat-8.15.3-2024.12.19-000074",
        ".ds-filebeat-8.15.3-2024.12.20-000075",
        ".ds-filebeat-8.10.3-2024.05.05-000106"
      ],
      "data_streams": [
        "filebeat-8.10.3",
        "filebeat-8.15.0",
        "filebeat-8.13.2",
        "filebeat-8.14.1",
        "filebeat-8.9.1",
        "filebeat-8.9.0",
        "filebeat-8.13.4",
        "filebeat-8.14.3",
        "filebeat-8.15.3"
      ],
      "composable_templates": [
        "filebeat-8.10.3",
        "filebeat-8.15.0",
        "filebeat-8.13.2",
        "filebeat-8.14.1",
        "filebeat-8.9.1",
        "filebeat-8.9.0",
        "filebeat-8.13.4",
        "filebeat-8.14.3",
        "filebeat-8.15.3"
      ]
    }
  }
}
      "step_info": {
        "message": "Waiting for all shard copies to be active",
        "shards_left_to_allocate": -1,
        "all_shards_active": false,
        "number_of_replicas": 1
      },

^^^ this is a problem... you need to figure out what is going on with the shards need to run the allocation explain on this index looks like it is stuck in warm because of this. It can not move to Cold until this is fixed..

Also just FYI all dates like 14d Warm is from the time of Hot Rollover so if rollover takes 3 days... then the indices wont move to warm for 17 days (3 days to Rollover + 14days before warm)... from index creation...

Got it - thanks for the explanation!

how many nodes in your cluster? Is cluster state green?

(do a get on _cluster/health , it will generate json output)

I've got two nodes in the cluster, and the cluster health was yellow due to unassigned shards.

I issued GET _cluster/allocation/explain (which I found on another thread); there were some errors referencing shard reallocation retries being reached. The error helpfully recommended issuing POST /_cluster/reroute?retry_failed&metric=none to retry, which seems to be working.

I would recommend setting up some alerting, so that you or someone is informed when your cluster state goes yellow for whatever reason.

That's a good suggestion. However, we didn't cough up for the license, so (as I understand it) we can't do alerting.

There are plenty of completely free alerting tools you could integrate ... for yellow/red status its a single curl command and checking for "not green"

curl -s -k -u "${EUSER}":"${EPASS}"  "https://${EHOST}:${EPORT}/_cluster/health" | jq -r .status | egrep -q '^green$' || echo "not green, please check!"

Any other response than "green" is IMHO worthy of investigation.

That's great - thanks @RainTown!