Index lifecycle management runs into IndexNotFoundException

Hi, I'm starting to use index lifecycle management policy to delete index that's been created for xx time. Here's the policy:

{ "policy": {
    "phases": {
        "hot": {
            "actions": {}},
        "delete": {
            "min_age": xx,
            "actions": {
                "delete": {}}}}}

The deletion min_age is a variable, and the policy is in an index template.
In test, I set the delete min_age to be 50ms, create an index template that has the policy, create an index that follows the template, and then wait for 50ms and check in a loop whether the index is deleted.
But I sometimes ran into

org.elasticsearch.index.IndexNotFoundException: no such index

other times the test passes.
Here's the more detailed log:

[2019-02-22T15:39:56,990][INFO ][o.e.c.m.MetaDataIndexTemplateService] [ejsJGon] adding template [template_test-1550849985826781595] for index patterns [test-1550849985826781595-*]
[2019-02-22T15:39:57,656][INFO ][o.e.c.m.MetaDataCreateIndexService] [ejsJGon] [test-1550849985826781595-2019-02-11] creating index, cause [api], templates [template_test-1550849985826781595], shards [5]/[2], mappings [_doc]
[2019-02-22T15:40:05,073][INFO ][o.e.c.m.MetaDataDeleteIndexService] [ejsJGon] [test-1550849985826781595-2019-02-11/yD0Ns76GR3WddMX_sjRoNQ] deleting index
[2019-02-22T15:40:05,207][ERROR][o.e.x.i.IndexLifecycleRunner] [ejsJGon] policy [test-1550849985826781595_policy] for index [test-1550849985826781595-2019-02-11] failed on step [{"phase":"delete","action":"delete","name":"delete"}]. Moving to ERROR step
org.elasticsearch.index.IndexNotFoundException: no such index
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe( ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.lambda$deleteIndices$0( ~[elasticsearch-6.6.0.jar:6.6.0]
	at$3$1.accept( ~[?:?]
	at java.util.HashMap$KeySpliterator.forEachRemaining( ~[?:?]
	at ~[?:?]
	at ~[?:?]
	at$ReduceOp.evaluateSequential( ~[?:?]
	at ~[?:?]
	at ~[?:?]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.deleteIndices( ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService$1.execute( ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute( ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.service.MasterService.executeTasks( ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs( ~[elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.service.MasterService.runTasks( [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.service.MasterService$ [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed( [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.cluster.service.TaskBatcher$ [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean( [elasticsearch-6.6.0.jar:6.6.0]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$ [elasticsearch-6.6.0.jar:6.6.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker( [?:?]
	at java.util.concurrent.ThreadPoolExecutor$ [?:?]
	at [?:?]

So in the log, the IndexLifecycleRunner can specify the correct index name, but it cannot find the index.
Is that because 50ms is too small? Can I get some help regarding this? Thanks!

I do not see the point in setting min_age to 50ms. What is the usecase? What are you trying to test or achieve?

Thanks for replying. I'm trying to test if my ilm policy definition is correct and how it's integrated into index template is correct in my code. By setting the min_age to a small value, I can test whether it's working in a go unit test. The use case is eventually to use ilm to delete index older than 28 days. Right now we're using curator to delete old indices, and would like to replace it with the ilm.

I do not that think that is going to work as I believe conditions are checked periodically at an interval far greater than the timeout you have specified. Indices are expected to be reasonably long lived, so it does not make any sense checking condition extremely frequently if this adds load to the cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.