Index status Open but in Delete phase

Hi

I have recently been working on a new data source for Threat Intel data to make use of the Indicator Match detection. My plan was to ingest the TI data in to an index and then roll that index over every 12hrs in to a delete phase as it expires.

This works, I run a cron job to download TI data every 12 hours. This is ingested by the Elastic Agent monitoring a 'Custom Log' which has Filebeat monitor my directory for new TI data.

All of this is working and I have an index template and Ingest pipeline configured.

The problem is the ILM policy is not working as I would expect. It is set to have a Hot phase with a maximum duration of 12hrs. The Delete phase then activates and is supposed to Delete 1min from rollover.

I now have an index reporting a status of open but it is in the delete phase.

I don't understand how it can be in a delete phase but not delete?

I ran a query on the ILM status on the index as shown below:

GET .ds-logs-threat_intel-default-000001/_ilm/explain
{
"indices" : {
".ds-logs-threat_intel-default-000001" : {
"index" : ".ds-logs-threat_intel-default-000001",
"managed" : true,
"policy" : "threat-intel",
"lifecycle_date_millis" : 1605756732243,
"age" : "17.83h",
"phase" : "delete",
"phase_time_millis" : 1605757332389,
"action" : "complete",
"action_time_millis" : 1605756732744,
"step" : "complete",
"step_time_millis" : 1605757332389,
"phase_execution" : {
"policy" : "threat-intel",
"phase_definition" : {
"min_age" : "1h",
"actions" : { }
},
"version" : 3,
"modified_date_in_millis" : 1605802413374
}
}
}
}

This confirms that my ILM policy is being applied. Below is the configuration of the ILM policy, named threat-intel

PUT _ilm/policy/threat-intel
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "1gb",
"max_age": "12h"
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "1h",
"actions": {}
}
}
}
}

I would appreciate any ideas as to what I can check to determine why an index would be in the delete phase but not actually delete? Could there be something holding it open for some reason?

I can't see any errors in the elasticsearch.log file, the last log entry relating to this index reports it moving to the delete phase but nothing more.

[2020-11-19T03:42:12,389][INFO ][o.e.x.i.IndexLifecycleTransition] [itl101400.comm.ad.roke.co.uk] moving index [.ds-logs-threat_intel-default-000001] from [{"phase":"hot","action":"complete","name":"complete"}] to [{"phase":"delete","action":"complete","name":"complete"}] in policy [threat-intel]

Any guidance appreciated.

Hello Phil,

Your ILM delete phase has no action:

"delete": {
  "min_age": "1h",
  "actions": {}
}

You have to add the delete action to the delete stage:

"delete": {
  "min_age": "1h",
  "actions": {
    "delete" : { }
  }
}

Best regards
Wolfram

Hello Wolfram

Thank you very much for your reply. I did think it seemed a bit odd that the action had no parameters.

It may be that there is a bug in Kibana as whenever I create a new ILM policy, with the delete phase enabled the action doesn't have a value. To work around this, I have put the correct code in via dev tools and this seems to work.

I have 2 installs running (test and live), both on v7.10 and both behave the same with the ILM policy creation.

To confirm, I have just gone to create a new ILM policy in Kibana. The only change I made from default was to activate the delete phase and if I click 'Show request' in the bottom right the action shows as null.

As a workaround, I took that code from the show request window over to the Dev Tools window and input it there, adding your suggestion with a delete phase.

PUT _ilm/policy/threat-intel
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_size": "1gb",
"max_age": "12h"
},
"set_priority": {
"priority": 100
}
}
},
"delete": {
"min_age": "1h",
"actions": {
"delete" : { }
}
}
}
}
}

If I then query the policy, it shows as correct

GET _ilm/policy/threat-intel
{
"threat-intel" : {
"version" : 4,
"modified_date" : "2020-11-20T05:45:58.239Z",
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_size" : "1gb",
"max_age" : "12h"
},
"set_priority" : {
"priority" : 100
}
}
},
"delete" : {
"min_age" : "1h",
"actions" : {
"delete" : {
"delete_searchable_snapshot" : true
}
}
}
}
}
}
}

If I then check the policy in the UI by simply viewing it and clicking show request, the delete phase is there.

As it stands, my indexes are all still there and haven't deleted but I only changed it a couple of hours ago and perhaps the older indexes are in a funny state. I will see what it does over the weekend after a few indices should be created and deleted and update.

Hello Phil,

I only have a 7.9 cluster running and I couldn't replicate your problem there - maybe someone else can replicate it?

I don't think the existing indizes are automatically deleted as their status shows complete in your initial post. It is just a guess but I think the phase delete was started and as no actions where found the status was set to complete. Maybe it makes sense to investigate if ElasticSearch could set it to error if a phase was enabled but no actions exist?

To delete the old indices you could either delete them manually(newly created indizes should be deleted automatically in the delete phase) you could use DELETE .ds-logs-threat_intel-default-000001 or you could try the retry api of ILM, although the documentation only mentioned retries in an ERROR state:
POST /.ds-logs-threat_intel-default-000001/_ilm/retry

Best regards
Wolfram

Thanks for checking Wolfram. The last time I created an ILM policy was 7.8 or 7.9 so I think this may be a new bug but hopefully someone else can check and confirm.

I agree with your comments. The indicies are complete so will likely not delete. Indicies created from now on will hopefully delete so I will check that on Monday when some should have deleted and confirm. If so, I will manually delete the old ones.

Thank you again for your help