Elasticsearch deleting 10 days old index every day automatically

elasticsearch installed on aks, deleting 10 days old index automatically everyday at the same time.

version:
elasticsearch: 8.17.1
kibana: 8.17.1

current ilm policy is set to 30days warm, 180day delete.
xpack.security.enabled is true
disk have plenty of space
no cron job in the pod

{"@timestamp":"2025-07-21T15:00:00.350Z", "log.level":"TRACE", "message":"0 tombstones purged from the cluster state. Previous tombstone size: 446. Current tombstone size: 447.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#1343]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-21T15:00:00.522Z", "log.level": "INFO", "message":"[logstash-2025.07.11/kJLjBTGCTZi0yVTgQ5dlpA] deleting index", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#1343]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-21T15:00:00.522Z", "log.level":"TRACE", "message":"0 tombstones purged from the cluster state. Previous tombstone size: 447. Current tombstone size: 448.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#1343]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-21T15:06:04.135Z", "log.level":"DEBUG", "message":"[.internal.alerts-default.alerts-default-000001] running periodic policy with current-step [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"check-rollover-ready\"}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.ilm.IndexLifecycleRunner","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}

Elasticsearch does not automatically delete data so something is doing it. As it happens at a specific time it sounds like it could be an external cron job invoking something like e.g. Curator. Check which roles have access to deleting indices and change this gor roles that should not have it.

thanks. I just go through all the roles, only superuser can delete that indices pattern.
no curator or cron job defined as well.
we have setup 3 environment with same helm chart, only difference are the hostname, it only happened in one of the environment.

Welcome to the forum @leo-lee

An interesting issue. I was wondering how I would troubleshoot in your shoes. Asking here was a good first step.

From above example, on 21st July "something" deleted the logstash-* index from 11th July at 15:00. So you can probably with some confidence predict what "it" will try to do today, tomorrow, etc. So how about preventing that delete succeeding, force an error. i.e. (temporarily) stop it being able to delete that specific index, and see what complains/errors/... (might) be thrown somewhere?

So today is 23rd July, so something will presumably try to delete the index logstash-2025.07.13. Not sure the best way to force an error on a deletion attempt, maybe:

PUT /logstash-2025.07.13/_settings
{
    "index.blocks.metadata": true
}

??

Also, it's not given what is feeding data to your logstash-* indices. Some integration, something put together locally, ... Whatever that is might have cron-like index-management like logic in it that doesn't apply in your other 2 clusters, but does apply here.

Thanks, we have fluent-bit in an app which pass to fluentd which feed into the efk cluster, haven't setup any cron job,just ilm policy.

I've updated the indice with your suggestion and also removed the ilm policy as well. The delete will happen at 11pm awst(), will let you know the result.

{
  "logstash-2025.07.13": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "blocks": {
          "write": "true",
          "metadata": "false",
          "read": "false"
        },
        "provided_name": "logstash-2025.07.13",
        "creation_date": "1752364805404",
        "priority": "100",
        "number_of_replicas": "1",
        "uuid": "9KgA6lXoQo-rt4PkhY18kA",
        "version": {
          "created": "8521000"
        }
      }
    }
  }
}

Well, I suggested to set index.blocks.metadata to true, you look to have set index.blocks.write to true, which AFAIK wont prevent index deletion.

And usually best, if you want to get to root cause, is to change one thing at a time then test/measure, and iterate around that. If you dont really care on the why, of course you can try parallel approaches.

This gives no errors

PUT /indextest/_doc/1
{ "key": "value"}
PUT /indextest/_settings
{
    "index.blocks.write": true
}
DELETE /indextest

whereas this gives an error on first DELETE

PUT /indextest/_doc/1
{ "key": "value"}
PUT /indextest/_settings
{
    "index.blocks.metadata": true
}
DELETE /indextest
PUT /indextest/_settings
{
    "index.blocks.metadata": false
}
DELETE /indextest

You are right, the logstash-2025.07.13 was deleted last night.

{"@timestamp":"2025-07-23T15:00:00.385Z", "log.level":"TRACE", "message":"0 tombstones purged from the cluster state. Previous tombstone size: 452. Current tombstone size: 453.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#1942]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-23T15:00:00.484Z", "log.level": "INFO", "message":"[logstash-2025.07.13/9KgA6lXoQo-rt4PkhY18kA] deleting index", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#1942]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-23T15:00:00.484Z", "log.level":"TRACE", "message":"0 tombstones purged from the cluster state. Previous tombstone size: 453. Current tombstone size: 454.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#1942]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-23T15:06:04.135Z", "log.level":"DEBUG", "message":"[.internal.alerts-default.alerts-default-000001] running periodic policy with current-step [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"check-rollover-ready\"}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.ilm.IndexLifecycleRunner","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}

i've now set the logstash-2025.07.14 with:

PUT /logstash-2025.07.14/_settings
{
  "index.blocks.metadata": true
}

and the index management page is showing

will see if that catch the delete error tonight, thanks!

Can you share your ILM policy and the template that is being applied to this specific index?

{
  "logstash-policy": {
    "version": 9,
    "modified_date": "2025-07-16T08:07:51.630Z",
    "policy": {
      "phases": {
        "delete": {
          "min_age": "180d",
          "actions": {
            "delete": {
              "delete_searchable_snapshot": false
            }
          }
        },
        "hot": {
          "min_age": "0ms",
          "actions": {
            "set_priority": {
              "priority": 10
            }
          }
        },
        "warm": {
          "min_age": "90d",
          "actions": {
            "set_priority": {
              "priority": 20
            }
          }
        }
      }
    },
    "in_use_by": {
      "indices": [
        "logstash-2025.07.19",
        "logstash-2025.07.18",
        "logstash-2025.07.17",
        "logstash-2025.07.16",
        "logstash-2025.07.15",
        "logstash-2025.07.14",
        "logstash-2025.07.24",
        "logstash-2025.07.23",
        "logstash-2025.07.22",
        "logstash-2025.07.21",
        "logstash-2025.07.20"
      ],
      "data_streams": [],
      "composable_templates": [
        "logstash-template"
      ]
    }
  }
}
{
  "index_templates": [
    {
      "name": "logstash-template",
      "index_template": {
        "index_patterns": [
          "search_statistics-*",
          "statistics-*",
          "logstash-*"
        ],
        "template": {
          "settings": {
            "index": {
              "lifecycle": {
                "name": "logstash-policy"
              }
            }
          }
        },
        "composed_of": [],
        "ignore_missing_component_templates": []
      }
    }
  ]
}

Yeah, the policy seems ok.

Are you really sure that there is nothing else scheduled to delete the data? Do you have a paid license or can enable the trial to have audit and try to get the source of the requests? It really looks like that there is something scheduled to delete the data.

Also, just to be sure, did you edit the logs to redact anything? You are mentioning indeces named logstas-YYYY.MM.dd, but in your screenshot you have an index named idit-logstash-YYYY.MM.dd, did you redacted the prefix in the logs or this is a different index?

We don't have paid license yet, will have one in a month or so.
we have not setup any schedule to delete the data, only ilm.
i did modify some wording or other irrelevant details in the logs before i upload. the original indice name is idit-logstash-* :slight_smile:

I would expect all applications and systems indexing or querying data in the cluster to have custom roles/users defined and that the superuser should not be used for anything most of the time. Please verify whether this is the case or not.

If the superuser is the only role that has privilege to delete indices and it should not be in regular use I would recommend changing the password to see if this makes a difference. I still suspect someone has set up some external cron job outside the cluster triggering the delete.

After setting "index.blocks.metadata": true to logstash-2025.07.14. the log survived overnight but the log shows nothing at T15:00:00. checked all nodes.

{"@timestamp":"2025-07-24T14:56:04.137Z", "log.level":"DEBUG", "message":"[.internal.alerts-transform.health.alerts-default-000001] running periodic policy with current-step [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"check-rollover-ready\"}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.ilm.IndexLifecycleRunner","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-24T14:56:04.137Z", "log.level":"DEBUG", "message":"[.internal.alerts-observability.threshold.alerts-default-000001] running periodic policy with current-step [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"check-rollover-ready\"}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.ilm.IndexLifecycleRunner","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-24T15:06:04.135Z", "log.level":"DEBUG", "message":"[.internal.alerts-default.alerts-default-000001] running periodic policy with current-step [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"check-rollover-ready\"}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.ilm.IndexLifecycleRunner","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-24T15:06:04.135Z", "log.level":"DEBUG", "message":"[.internal.alerts-stack.alerts-default-000001] running periodic policy with current-step [{\"phase\":\"hot\",\"action\":\"rollover\",\"name\":\"check-rollover-ready\"}]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][trigger_engine_scheduler][T#1]","log.logger":"org.elasticsearch.xpack.ilm.IndexLifecycleRunner","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}

Would a failed delete be logged? I don’t know your logging config. Try it out with a test index.

btw, is the original ILM policy still in place? If not, I.e. you changed 2 things, maybe revert one change and see what happens.

yes the ILM policy still in place.

Tried setting "index.blocks.metadata": true on the testindex at Dev tool,

{
  "error": {
    "root_cause": [
      {
        "type": "cluster_block_exception",
        "reason": "index [indextest] blocked by: [FORBIDDEN/9/index metadata (api)];"
      }
    ],
    "type": "cluster_block_exception",
    "reason": "index [indextest] blocked by: [FORBIDDEN/9/index metadata (api)];"
  },
  "status": 403
}

it shows the error message on the righthand side but no error is logged in the elasticsearch.log

{"@timestamp":"2025-07-25T08:07:11.617Z", "log.level":"TRACE", "message":"0 tombstones purged from the cluster state. Previous tombstone size: 455. Current tombstone size: 456.", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#2452]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-25T08:07:56.191Z", "log.level": "INFO", "message":"[indextest] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#2452]","log.logger":"org.elasticsearch.cluster.metadata.MetadataCreateIndexService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-25T08:07:56.506Z", "log.level": "INFO", "message":"[indextest/BxEnA3oxTzuABN6SnzkR-g] create_mapping", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#2452]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}
{"@timestamp":"2025-07-25T08:07:56.747Z", "log.level": "INFO",  "current.health":"GREEN","message":"Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[indextest][0]]]).","previous.health":"YELLOW","reason":"shards started [[indextest][0]]" , "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[efk-es-elasticsearch-0][masterService#updateTask][T#2452]","log.logger":"org.elasticsearch.cluster.routing.allocation.AllocationService","elasticsearch.cluster.uuid":"qy0_RtNxTIa0vtQmUGikSA","elasticsearch.node.id":"7KJtuPA1Q4u1cxdAfzxEVA","elasticsearch.node.name":"efk-es-elasticsearch-0","elasticsearch.cluster.name":"efk"}

GET _cluster/settings:

{
  "persistent": {},
  "transient": {
    "logger": {
      "org": {
        "elasticsearch": {
          "cluster": {
            "metadata": {
              "MetadataDeleteIndexService": "TRACE"
            }
          },
          "xpack": {
            "ilm": "DEBUG"
          }
        }
      }
    }
  }
}

I’m almost sure there will be a way to enable logging that would show the attempt, I just don’t know what it is.

But anyways, the index survived. We can’t assume with certainty a delete attempt was made, but there’s no reason to think it wasn’t.

You can sniff traffic to port 9200 around the key time and see if there’s a connection made, and if so from where. If it’s a cron-like tool doing the deleting there’s a decent chance it just creates a new connection to try, fails, then exits.

You won't get those logs without having the audit track enabled, which requires a paid license.

But as already mentioned, Elasticssearch will only delete data on two cases, when you have an ILM configured or when it receives a request to delete it.

So I still think that there is some kind of forgotten script/cron running to delete the data.

Is your cluster using https? If not, since the deletion happens on a know time, you could configure a packet capture with tcpdump to get the requests arriving at your nodes, filtering by a DELETE for example.

1 Like

Yeah, this is what I suggested. If HTTPS is used, then you might still be lucky and see an unexpected connection from some host.

@leo-lee - Also, on the audit trail via trial license, and I don't want to suggest anything dodgy, but if that enables an easier solution to this problem then why not use it? When the trial runs out things just revert to basic license - you will not be losing anything ?

1 Like

I've set "index.blocks.metadata": false to indices logstash-2025.07.14 on Friday, and it was deleted on that night. I double checked the node, the pipeline and the helm chart with curator or cronjob but nothing. it's interesting that only one environment has this issue and all other environments use the same chart.
Thanks everyone for the suggestions, I'll try something else before we got the license to deploy the audit track. Happy to close this now, thanks!

Have you tried changing the passwords and API keys?

By the way, nothing dodgy about using the trial license if it's available exactly what it's there for...

If you've already used it in the past though it won't be available

2 Likes