ILM not deleting index

I am running a simple setup with ILM and small index. ILM should delete the index but instead its stuck at "step": "check-rollover-ready",

here is my script to recreate the index

#!/bin/bash

echo -e "\n update settings"
curl -s -XPUT localhost:9200/_cluster/settings -H"content-type: application/json" -d'
{
  "persistent": {
    "indices.lifecycle.poll_interval": "1s",
    "logger.org.elasticsearch.xpack.ilm": "TRACE"
  }
}
'

echo -e "\nadd index template"
curl -s -XPUT localhost:9200/_template/devtest -H"content-type: application/json" -d'
{
  "index_patterns": [
    "devtest-*"
  ],
  "settings": {
    "index": {
      "lifecycle": {
        "name": "devtest",
        "rollover_alias": "devtest"
      },
      "number_of_shards": "1",
      "number_of_replicas": "0"
    }
  }
}
'

echo -e "\nadd policy version 1"
curl -s -XPUT -H"content-type: application/json"  localhost:9200/_ilm/policy/devtest -d'
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50mb",
            "max_age": "2m"
          }
        }
      },
      "delete": {
        "min_age": "2m",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
'

echo -e "\ncreating index devtest"
curl -s -XPUT -H"content-type: application/json"  localhost:9200/devtest-000001 -d'
{
  "aliases": {
    "devtest": {
      "is_write_index": true
    }
  }
}
'

This is the result after waiting a few minutes:

# curl -s localhost:9200/devtest/_ilm/explain | jq . 
{
  "indices": {
    "devtest-000001": {
      "index": "devtest-000001",
      "managed": true,
      "policy": "devtest",
      "index_creation_date_millis": 1688070066041,
      "time_since_index_creation": "14.44m",
      "lifecycle_date_millis": 1688070066041,
      "age": "14.44m",
      "phase": "hot",
      "phase_time_millis": 1688070125507,
      "action": "rollover",
      "action_time_millis": 1688070139520,
      "step": "check-rollover-ready",
      "step_time_millis": 1688070139520,
      "phase_execution": {
        "policy": "devtest",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "50mb",
              "max_age": "2m"
            }
          }
        },
        "version": 1,
        "modified_date_in_millis": 1688070043004
      }
    }
  }
}

curl -s localhost:9200 | jq .version.number

"8.6.2"

Please advise , Thank you

Those values you are using are pretty small, ILM works best with size around tens of GB and age around days.

It is even pretty hard to test ILM with those small values.

Are you planning to use those small values on production or is just trying to test ILM?

no, production will have much larger indexes, this is just to keep things as simple as possible so I can troubleshoot the issue.
Ill go ahead and make it larger and test again

Hi @raymondmintz11 Welcome to the comunity!

As @leandrojmp mentioned ILM is meant to be used on the scale of GB and Hours and up...
It has been my experience that Making the values unrealistically smal can actually make it harder to debug.

What are you actually trying to debug / figure out?

"indices.lifecycle.poll_interval": "1s",

Event thought that is probably accepted that is probably not really valid...

Im trying to understand why the last index of my rollover gets stuck in "check-rollover-ready".
Basically the last rollover index does not delete as expected.
I have a cluster that uses ILM with 100's of GB of data and for testing purposes have set the index to delete after 15 minutes to no avail

What does that mean "last index"

last index is the last rollover, in this case its the one ending in "*-000003"

green open soc-2023.06.06-000001                    W-0pBQgYRhCbnumtURENgQ 10 1         0 0   4.3kb   2.1kb
green open soc-2023.06.08-000001                    Ue4eTxq8QQK3ces6AlRp0w  1 1  21509904 0  40.3gb  20.1gb
green open soc-2023.06.08-000003                    _ypdsZCYTU6PkT0VPgyVDw  1 1  21719426 0  40.3gb  20.1gb```

What is the _ilm/explain on that index

What is the ILM policy for that index.

And something looks odd why did the numbers restart, missing etc did you manually update / recreated etc?

sorry to switch the index in the middle of example, but this index here might highlight the issue better.

The step is stuck in "check-rollover-ready" thus not moving on to delete the index

[Thu Jun 29 22:05:20] root@someHost:~# curl -s localhost:9200/soc-2023.06.11-000002/_ilm/explain | jq .  
{
  "indices": {
    "soc-2023.06.11-000002": {
      "index": "soc-2023.06.11-000002",
      "managed": true,
      "policy": "soc_datastream_policy",
      "index_creation_date_millis": 1686497836765,
      "time_since_index_creation": "18.26d",
      "lifecycle_date_millis": 1686497836765,
      "age": "18.26d",
      "phase": "hot",
      "phase_time_millis": 1686497837908,
      "action": "rollover",
      "action_time_millis": 1686497839509,
      "step": "check-rollover-ready",
      "step_time_millis": 1686497839509,
      "phase_execution": {
        "policy": "soc_datastream_policy-2023.06.11",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "80gb"
            }
          }
        },
        "version": 1,
        "modified_date_in_millis": 1686430803487
      }
    }
  }
}
[Thu Jun 29 22:05:26] root@someHost:~# 
[Thu Jun 29 22:05:27] root@someHost:~# 
[Thu Jun 29 22:05:28] root@someHost:~# curl -s localhost:9200/_ilm/policy/soc_datastream_policy | jq . 
{
  "soc_datastream_policy": {
    "version": 7,
    "modified_date": "2023-06-29T18:12:51.122Z",
    "policy": {
      "phases": {
        "warm": {
          "min_age": "2m",
          "actions": {
            "set_priority": {
              "priority": 80
            }
          }
        },
        "hot": {
          "min_age": "0ms",
          "actions": {
            "set_priority": {
              "priority": 100
            },
            "rollover": {
              "max_size": "50gb",
              "max_age": "1m"
            }
          }
        },
        "delete": {
          "min_age": "4m",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      },
    "in_use_by": {
      "indices": [
        "soc-2023.06.11-000002",
        "soc-2023.06.09-000002"
      ],
      "data_streams": [],
      "composable_templates": []
    }
  }
}
[Thu Jun 29 22:05:32] root@someHost:~# 

So @raymondmintz11 Some subtlety here I think.

[Thu Jun 29 22:05:20] root@someHost:~# curl -s localhost:9200/soc-2023.06.11-000002/_ilm/explain | jq .  
{
  "indices": {
    "soc-2023.06.11-000002": {
      "index": "soc-2023.06.11-000002",
.....
        },
        "version": 1,
        "modified_date_in_millis": 1686430803487
      }

So ILM you are looking at the Latest version of the policy Version 7 but the actual Index is under Versions 1 ... So that could explain it... There are points in time when an Index

If version 1 is 50GB + 30 Days you might be stuck in that... just a thought

Trying to think if you can see the cached version of the ILM policy

Perhaps read this....

How changes are applied

When a policy is initially applied to an index, the index gets the latest version of the policy. If you update the policy, the policy version is bumped and ILM can detect that the index is using an earlier version that needs to be updated.

Changes to min_age are not propagated to the cached definition. Changing a phase’s min_age does not affect indices that are currently executing that phase.

For example, if you create a policy that has a hot phase that does not specify a min_age, indices immediately enter the hot phase when the policy is applied. If you then update the policy to specify a min_age of 1 day for the hot phase, that has no effect on indices that are already in the hot phase. Indices created after the policy update won’t enter the hot phase until they are a day old.

How new policies are applied

When you apply a different policy to a managed index, the index completes the current phase using the cached definition from the previous policy. The index starts using the new policy when it moves to the next phase.

ill create another index with the same version throughout its life cycle and get back to you.
Thanks for the reference there.

1 Like

Ohhh doh! It is right there

AND btw you should be setting max_primary_shard_size not max_size which is the index size (old deprecated bad)

        "hot": {
          "min_age": "0ms",
          "actions": {
            "forcemerge": {
              "max_num_segments": 1
            },
            "rollover": {
              "max_age": "1d",
              "max_primary_shard_size": "10gb" <!--- Correct New Way of setting sizes
            }
          }
        },

@raymondmintz11 You can manually force a rollover then it will pick up the new policy that may be your simple solution

Hi, please allow me to start over, i think im confusing people here:

Here is what I expect:
Indices will rollover (if applicable) and delete all indexes associated to the ILM policy after 1Hour

pasting my settings and ILM policy here, will respond again in 1Hour with results from my test

[Fri Jun 30 19:14:22] root@someHost:/soc# curl -s localhost:9200/_ilm/policy/soc_datastream_policy | jq . 
{
  "soc_datastream_policy": {
    "version": 9,
    "modified_date": "2023-06-30T19:14:13.369Z",
    "policy": {
      "phases": {
        "hot": {
          "min_age": "0ms",
          "actions": {
            "set_priority": {
              "priority": 100
            },
            "rollover": {
              "max_size": "300gb"
            }
          }
        },
        "delete": {
          "min_age": "1h",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      },
      "_meta": {
        "description": "SECC testing index deletion",
        "project": {
          "name": "SECC-5218",
          "department": "SECC"
        }
      }
    },
    "in_use_by": {
      "indices": [
        "soc-2023.06.11-000002",
        "soc-2023.06.09-000002",
        "soc-2023.06.30-000001"
      ],
      "data_streams": [],
      "composable_templates": []
    }
  }
}
[Fri Jun 30 19:14:34] root@someHost:/soc# 

@stephenb looks like my ILM is not deleting the index.

# curl -s http://localhost:9200/soc-2023.06.30-000001/_ilm/explain | jq . 
{
  "indices": {
    "soc-2023.06.30-000001": {
      "index": "soc-2023.06.30-000001",
      "managed": true,
      "policy": "soc_datastream_policy",
      "index_creation_date_millis": 1688151138811,
      "time_since_index_creation": "1.4h",
      "lifecycle_date_millis": 1688151138811,
      "age": "1.4h",
      "phase": "hot",
      "phase_time_millis": 1688152296524,
      "action": "rollover",
      "action_time_millis": 1688152298725,
      "step": "check-rollover-ready",
      "step_time_millis": 1688152298725,
      "phase_execution": {
        "policy": "soc_datastream_policy",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "set_priority": {
              "priority": 100
            },
            "rollover": {
              "max_size": "300gb"
            }
          }
        },
        "version": 9,
        "modified_date_in_millis": 1688152453369
      }
    }
  }
}

Hi @raymondmintz11

I think you are confused how ILM Works.

From the docs here

Rollover condition blocks phase transition

The rollover action only completes if one of its conditions is met. This means that any subsequent phases are blocked until rollover succeeds.

For example, the following policy deletes the index one day after it rolls over. It does not delete the index one day after it was created.

In your Case, The Next phase Delete is calculated from the rollover time not index creation time this is a common misconception... (Note there is a way to do that but there are other pros/cons of that)

Since your index has not rolled over yet the calculation for the delete time 1H has not started and will not start UNTILL the index rolls over

The Delete time will be ~ 1 hour AFTER rollover.

So Here is the explanation

Your Policy says rollover the index when that the total index primary size is 300GB there is no time limit (could take minutes, days, weeks etc ... no time limit) ... your index if you want to put a time limit in hot for rollover you need to add it this policy will not rollover until the size is met

"phases": {
        "hot": {
          "min_age": "0ms",
          "actions": {
            "set_priority": {
              "priority": 100
            },
            "rollover": {
              "max_size": "300gb"
            }
          }
        },

See all the conditions you can use for rollover ... as I stated you are using a deprecated setting

If you want time based you need to set max_age but there can be consequences of that you could get lots of small indices ... or not

And your _ilm/explain say exactly that this index is still waiting to rollover ... so it will never Delete because the index has not rolled over

Since you did not show the actual size of the index I can not be sure pri.store.size and can only presume the index is still less than 300GB.

      "age": "1.4h",
      "phase": "hot", <! -- Still In Hot
      "phase_time_millis": 1688152296524,
      "action": "rollover",
      "action_time_millis": 1688152298725,
      "step": "check-rollover-ready", <!-- Checking / Waiting for Rollover Condition that has not been met 
      "step_time_millis": 1688152298725,
      "phase_execution": {
        "policy": "soc_datastream_policy",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "set_priority": {
              "priority": 100
            },
            "rollover": {
              "max_size": "300gb"
            }
          }
        },

Hope that helps...

Summary: DELETE is only calculated AFTER the rollover ... no rollover...no delete

If you want to delete any index after an hour

set max_age 1h in hot
and min_age 0s in delete

Careful

So here is another example ...if you say were purely using time and you set
max_age to 1h in both Hot and Delete ... the index would delete ~2 Hours from Creation
1 Hour to rollover + 1 Hour in the Delete Phase

Then And again in real cases... with large amounts of data max_primary_shard_size 50GB rolls over in about a day or some portion of a day... then Delete in 30 days (after rollover) ... your data Delete in about 31 Days or perhaps 30.5 ... so at scale and time it works as expected.

2 Likes

@stephenb Thank you so much for the clarification there. That changes the way im looking at this config.

I did run into one discrepency where delete doesnt have a max_age but it does have a min_age
This is what I will try to run and test delete with:

[Mon Jul 03 16:33:36] root@someHost:~# 
[Mon Jul 03 16:33:37] root@someHost:~# indices.sh 
green open soc-2023.07.03-000001 yEJd5GexQV-70ZXon19bIw 10 1 0 0 4.3kb 2.1kb
[Mon Jul 03 16:33:42] root@someHost:~# 
[Mon Jul 03 16:33:44] root@someHost:~# 
[Mon Jul 03 16:33:44] root@someHost:~# curl -s localhost:9200/_ilm/policy/soc_datastream_policy | jq . 
{
  "soc_datastream_policy": {
    "version": 12,
    "modified_date": "2023-07-03T16:30:25.867Z",
    "policy": {
      "phases": {
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "10gb",
              "max_age": "1h"
            }
          }
        },
        "delete": {
          "min_age": "0s",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
        "soc-2023.07.03-000001"
      ],
      "data_streams": [],
      "composable_templates": []
    }
  }
}
[Mon Jul 03 16:33:52] root@someHost:~# 
[Mon Jul 03 16:35:20] root@someHost:~# 
[Mon Jul 03 16:35:20] root@someHost:~# curl -s localhost:9200/soc-2023.07.03/_ilm/explain | jq . 
{
  "indices": {
    "soc-2023.07.03-000001": {
      "index": "soc-2023.07.03-000001",
      "managed": true,
      "policy": "soc_datastream_policy",
      "index_creation_date_millis": 1688402011248,
      "time_since_index_creation": "2.11m",
      "lifecycle_date_millis": 1688402011248,
      "age": "2.11m",
      "phase": "hot",
      "phase_time_millis": 1688402013303,
      "action": "rollover",
      "action_time_millis": 1688402013503,
      "step": "check-rollover-ready",
      "step_time_millis": 1688402013503,
      "phase_execution": {
        "policy": "soc_datastream_policy",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "10gb",
              "max_age": "1h"
            }
          }
        },
        "version": 12,
        "modified_date_in_millis": 1688401825867
      }
    }
  }
}
[Mon Jul 03 16:35:37] root@someHost:~# 


1 Like

Yup That was a typo on my part!
Good catch... I fixed it for others.

@stephenb im seeing the index rollover shortly after the max_time threshold is met (as expected), then I see the previous index get deleted as well (as expected).

My question now is: when does rollover stop so delete can remove the last index ?

What Im observing is I get stuck with one dangling index:
For example, I can see that 00000{1,2,3} have deleted successfully but the last index will rollover again leaving me with another one straggler:

In this example below, the other indexes have deleted fine. but I still have this one:

green open soc-2023.07.03-000004 hXJzglI1Q8-tqE2HjivZEg 10 1 0 0 4.3kb 2.1kb

information about the ILM and index:

[Mon Jul 03 19:32:30] root@someHost:~# curl -s localhost:9200/soc-2023.07.03-000004/_ilm/explain | jq . 
{
  "indices": {
    "soc-2023.07.03-000004": {
      "index": "soc-2023.07.03-000004",
      "managed": true,
      "policy": "soc_datastream_policy",
      "index_creation_date_millis": 1688412408507,
      "time_since_index_creation": "5.73m",
      "lifecycle_date_millis": 1688412408507,
      "age": "5.73m",
      "phase": "hot",
      "phase_time_millis": 1688412410373,
      "action": "rollover",
      "action_time_millis": 1688412410773,
      "step": "check-rollover-ready",
      "step_time_millis": 1688412410773,
      "phase_execution": {
        "policy": "soc_datastream_policy",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "10gb",
              "max_age": "1h"
            }
          }
        },
        "version": 12,
        "modified_date_in_millis": 1688401825867
      }
    }
  }
}
[Mon Jul 03 19:32:32] root@someHost:~# curl -s localhost:9200/_ilm/policy/soc_datastream_policy | jq . 
{
  "soc_datastream_policy": {
    "version": 12,
    "modified_date": "2023-07-03T16:30:25.867Z",
    "policy": {
      "phases": {
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "10gb",
              "max_age": "1h"
            }
          }
        },
        "delete": {
          "min_age": "0s",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
        "soc-2023.07.03-000004"
      ],
      "data_streams": [],
      "composable_templates": []
    }
  }
}
[Mon Jul 03 19:32:41] root@someHost:~# 

just to reiterate my problem with actual last index that does not delete, its in hot, seems like its not going to rollover, but most of all, not being able to delete the index completely after 1 hour from rollover (where rollover is never accomplished even though we have max_age set to 1h

[Mon Jul 03 20:55:02] root@someHost:~# curl -s localhost:9200/soc-2023.07.03-000004/_ilm/explain | jq .
{
  "indices": {
    "soc-2023.07.03-000004": {
      "index": "soc-2023.07.03-000004",
      "managed": true,
      "policy": "soc_datastream_policy",
      "index_creation_date_millis": 1688412408507,
      "time_since_index_creation": "1.47h",
      "lifecycle_date_millis": 1688412408507,
      "age": "1.47h",
      "phase": "hot",
      "phase_time_millis": 1688412410373,
      "action": "rollover",
      "action_time_millis": 1688412410773,
      "step": "check-rollover-ready",
      "step_time_millis": 1688412410773,
      "phase_execution": {
        "policy": "soc_datastream_policy",
        "phase_definition": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "10gb",
              "max_age": "1h"
            }
          }
        },
        "version": 12,
        "modified_date_in_millis": 1688401825867
      }
    }
  }
}
[Mon Jul 03 20:55:12] root@someHost:~# 
[Mon Jul 03 20:55:13] root@someHost:~# 
[Mon Jul 03 20:55:13] root@someHost:~# curl -s localhost:9200/_ilm/policy/soc_datastream_policy | jq . 
{
  "soc_datastream_policy": {
    "version": 12,
    "modified_date": "2023-07-03T16:30:25.867Z",
    "policy": {
      "phases": {
        "hot": {
          "min_age": "0ms",
          "actions": {
            "rollover": {
              "max_size": "10gb",
              "max_age": "1h"
            }
          }
        },
        "delete": {
          "min_age": "0s",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": true
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
        "soc-2023.07.03-000004"
      ],
      "data_streams": [],
      "composable_templates": []
    }
  }
}
[Mon Jul 03 20:55:37] root@someHost:~# 
[Mon Jul 03 20:55:39] root@someHost:~# 
[Mon Jul 03 20:55:39] root@someHost:~# indices.sh 
green open soc-2023.07.03-000004 hXJzglI1Q8-tqE2HjivZEg 10 1 0 0 4.3kb 2.1kb
[Mon Jul 03 20:55:41] root@someHost:~#