This request follows a previous ticket that was never really answered.
I have created a data stream with ILM with phases ranging from Hot to Deleted.
The backing indices are successfully moved into Frozen state but never continue into the Deleted phase.
Why are the indices not moving to deleted state?
Here is the lifecycle policy:
{
"filebeat" : {
"version" : 11,
"modified_date" : "2022-05-11T10:05:49.653Z",
"policy" : {
"phases" : {
"frozen" : {
"min_age" : "70m",
"actions" : {
"searchable_snapshot" :…
ES 8.8.2 - free trial in local - paid enterprise licence on other environment
Same problem, I have created a data stream with ILM with phases ranging from Hot, Frozen to Deleted.
The backing indices are successfully moved into Frozen state but never continue into the Deleted phase.
Why are the indices not moving to deleted state?
Add some settings ans lifecycle policy with aws s3 searchable snapshot.
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval": "30s"
}
}
PUT _ilm/policy/test-policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"set_priority": {
"priority": 100
},
"rollover": {
"max_primary_shard_size": "50mb",
"max_age": "10s"
}
}
},
"frozen": {
"min_age": "5m",
"actions": {
"searchable_snapshot": {
"snapshot_repository": "snapshot_s3_repository"
}
}
},
"delete": {
"min_age": "15m",
"actions": {
"delete": {}
}
}
}
}
}
Create index template with lifecylcle and data streams. Add some data.
PUT _index_template/test-template
{
"index_patterns": ["test-index*"],
"data_stream": { },
"template": {
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "date_optional_time||epoch_millis"
}
}
},
"settings": {
"lifecycle": {
"name": "test-policy"
},
"number_of_shards": 1,
"number_of_replicas": 0
}
}
}
POST test-index-1/_doc
{
"field1": "someValue2",
"@timestamp": 1689718060023
}
You can find elasticsearch.log here after one hour:
elasticsearch.log
[2023-07-19T00:02:51,667][INFO ][o.e.c.s.ClusterSettings ] [AWA-LAPTOP] updating [indices.lifecycle.poll_interval] from [10s] to [30s]
[2023-07-19T00:03:52,957][INFO ][o.e.x.i.a.TransportPutLifecycleAction] [AWA-LAPTOP] adding index lifecycle policy [test-policy]
[2023-07-19T00:06:17,269][INFO ][o.e.c.m.MetadataIndexTemplateService] [AWA-LAPTOP] adding index template [test-template] for index patterns [test-index*]
[2023-07-19T00:06:34,162][INFO ][o.e.c.m.MetadataIndexTemplateService] [AWA-LAPTOP] updating index template [test-template] for index patterns [test-index*]
[2023-07-19T00:08:28,387][INFO ][o.e.c.m.MetadataCreateIndexService] [AWA-LAPTOP] [.ds-test-index-1-2023.07.18-000001] creating index, cause [initialize_data_stream], templates [test-template], shards [1]/[0]
[2023-07-19T00:08:28,391][INFO ][o.e.c.m.MetadataCreateDataStreamService] [AWA-LAPTOP] adding data stream [test-index-1] with write index [.ds-test-index-1-2023.07.18-000001], backing indices [], and aliases []
[2023-07-19T00:08:28,817][INFO ][o.e.x.i.IndexLifecycleTransition] [AWA-LAPTOP] moving index [.ds-test-index-1-2023.07.18-000001] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [test-policy]
[2023-07-19T00:08:28,960][INFO ][o.e.x.i.IndexLifecycleTransition] [AWA-LAPTOP] moving index [.ds-test-index-1-2023.07.18-000001] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"set_priority","name":"set_priority"}] in policy [test-policy]
[2023-07-19T00:08:29,367][INFO ][o.e.c.m.MetadataMappingService] [AWA-LAPTOP] [.ds-test-index-1-2023.07.18-000001/2ff-u2EuQGmSQOyLB9qKKQ] update_mapping [_doc]
[2023-07-19T00:08:29,516][INFO ][o.e.x.i.IndexLifecycleTransition] [AWA-LAPTOP] moving index [.ds-test-index-1-2023.07.18-000001] from [{"phase":"hot","action":"set_priority","name":"set_priority"}] to [{"phase":"hot","action":"unfollow","name":"branch-check-unfollow-prerequisites"}] in policy [test-policy]
This file has been truncated. show original
ILM Explain
GET test-index-1/_ilm/explain
{
"indices": {
".ds-test-index-1-2023.07.18-000001": {
"index": ".ds-test-index-1-2023.07.18-000001",
"managed": true,
"policy": "test-policy",
"index_creation_date_millis": 1689718108382,
"time_since_index_creation": "33.78m",
"lifecycle_date_millis": 1689718131813,
"age": "33.39m",
"phase": "frozen",
"phase_time_millis": 1689718461739,
"action": "searchable_snapshot",
"action_time_millis": 1689718461739,
"step": "wait-for-index-color",
"step_time_millis": 1689718524472,
"repository_name": "snapshot_s3_repository",
"snapshot_name": "2023.07.18-.ds-test-index-1-2023.07.18-000001-test-policy-dgun9fltszwfctmnc1zj-a",
"step_info": {
"message": "index is not green; not all shards are active"
},
"phase_execution": {
"policy": "test-policy",
"phase_definition": {
"min_age": "5m",
"actions": {
"searchable_snapshot": {
"snapshot_repository": "snapshot_s3_repository",
"force_merge_index": true
}
}
},
"version": 1,
"modified_date_in_millis": 1689717832957
}
},
".ds-test-index-1-2023.07.18-000002": {
"index": ".ds-test-index-1-2023.07.18-000002",
"managed": true,
"policy": "test-policy",
"index_creation_date_millis": 1689718131896,
"time_since_index_creation": "33.39m",
"lifecycle_date_millis": 1689718131896,
"age": "33.39m",
"phase": "hot",
"phase_time_millis": 1689718132431,
"action": "rollover",
"action_time_millis": 1689718133239,
"step": "check-rollover-ready",
"step_time_millis": 1689718133239,
"phase_execution": {
"policy": "test-policy",
"phase_definition": {
"min_age": "0ms",
"actions": {
"set_priority": {
"priority": 100
},
"rollover": {
"max_age": "10s",
"max_primary_shard_size": "50mb"
}
}
},
"version": 1,
"modified_date_in_millis": 1689717832957
}
}
}
}
My index still green
GET /_cat/indices/.ds-test-index-1-2023.07.18-000001?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .ds-test-index-1-2023.07.18-000001 2ff-u2EuQGmSQOyLB9qKKQ 1 0 1 0 4.3kb 4.3kb
I don't really see why I have this "index is not green; not all shards are active" when everything is green and why the passage to delete is not done maybe I have to open an issue on the Github side?
But I have a feeling it's not the first, there seem to be similar issues here:
opened 05:28PM - 03 Sep 21 UTC
closed 07:19PM - 04 Mar 22 UTC
>bug
:Data Management/ILM+SLM
Team:Data Management
<!--
GitHub is reserved for bug reports and feature requests; it is not the pla… ce
for general questions. If you have a question or an unconfirmed bug , please
visit the [forums](https://discuss.elastic.co/c/elasticsearch). Please also
check your OS is [supported](https://www.elastic.co/support/matrix#show_os).
If it is not, the issue is likely to be closed.
For security vulnerabilities please only send reports to security@elastic.co.
See https://www.elastic.co/community/security for more information.
Please fill in the following details to help us reproduce the bug:
-->
**Elasticsearch version** (`bin/elasticsearch --version`): 7.14.1
**Description of the problem including expected versus actual behavior**:
If an ILM policy uses both `searchable_snapshot` and `allocate` in the cold phase, then the `allocate` action won't work right, and in fact will get wedged permanently (see workaround below for how to unstick it).
**Steps to reproduce**:
```
PUT _cluster/settings
{
"transient": {
"indices.lifecycle.poll_interval": "30s"
}
}
PUT _ilm/policy/test-policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0m",
"actions": {
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "2m",
"actions": {
"set_priority": {
"priority": 50
}
}
},
"cold": {
"min_age": "4m",
"actions": {
"set_priority": {
"priority": 100
},
"searchable_snapshot": {
"snapshot_repository": "found-snapshots"
},
"allocate": {
"number_of_replicas": 0
}
}
},
"frozen": {
"min_age": "6m",
"actions": {
"searchable_snapshot": {
"snapshot_repository": "found-snapshots"
}
}
}
}
}
}
PUT _template/test-template
{
"index_patterns": ["test-index*"],
"settings": {
"lifecycle": {
"name": "test-policy"
},
"number_of_shards": 1,
"number_of_replicas": 1
}
}
POST test-index-1/_doc
{
"field1": "someValue2"
}
```
Once the policy gets to the `allocate` action & step, it'll just sit there forever:
```
GET test-index-1/_ilm/explain
{
"indices" : {
"restored-test-index-1" : {
"index" : "restored-test-index-1",
"managed" : true,
"policy" : "test-policy",
"lifecycle_date_millis" : 1630689049418,
"age" : "6.5m",
"phase" : "cold",
"phase_time_millis" : 1630689317926,
"action" : "allocate",
"action_time_millis" : 1630689318037,
"step" : "allocate",
"step_time_millis" : 1630689379467,
"repository_name" : "found-snapshots",
"snapshot_name" : "2021.09.03-test-index-1-test-policy-vx9cjvout8q3ahibxej20g",
"phase_execution" : {
"policy" : "test-policy",
"phase_definition" : {
"min_age" : "4m",
"actions" : {
"allocate" : {
"number_of_replicas" : 0,
"include" : { },
"exclude" : { },
"require" : { }
},
"searchable_snapshot" : {
"snapshot_repository" : "found-snapshots",
"force_merge_index" : true
},
"set_priority" : {
"priority" : 100
}
}
},
"version" : 4,
"modified_date_in_millis" : 1630689027809
}
}
}
}
```
**Workaround**:
For any stuck indices, if you manually move the stuck index to complete, then everything will pick back up.
```
POST /_ilm/move/restored-test-index-1
{
"current_step": {
"phase": "cold",
"action": "allocate",
"name": "allocate"
},
"next_step": {
"phase": "cold",
"action": "complete",
"name": "complete"
}
}
```
opened 03:53PM - 27 Aug 21 UTC
>bug
:Data Management/ILM+SLM
Team:Data Management
**Elasticsearch version** : 7.14.0
**Plugins installed**: []
**Description… of the problem including expected versus actual behavior**:
**Current behavior**: ILM doesn't delete searchable snapshots when one of the associated indices waiting to go from a hot to a cold phase and then to a delete phase finds itself in a red status. Even if the index later recovers, the ILM policy doesn't return to deleting the searchable snapshots for any of the subsequent indices. This has the consequence that the storage can fill up on a hot or cold node because the snapshot is fully mounted.
**Expected behavior**: ILM should resume to deleting searchable snapshot for the next indices regardless of failures on previous indices.
**Steps to reproduce**:
1. Create ILM policy with moving indices from hot to cold nodes with fully mounted searchable snapshots, with these settings:
<details> <summary> ILM Settings </summary>
```
"policy" : {
"phases" : {
"hot" : {
"min_age" : "0ms",
"actions" : {
"rollover" : {
"max_primary_shard_size" : "3mb",
"max_age" : "30s"
},
"set_priority" : {
"priority" : 100
}
}
},
"delete" : {
"min_age" : "90s",
"actions" : {
"delete" : {
"delete_searchable_snapshot" : true
}
}
},
"cold" : {
"min_age" : "1m",
"actions" : {
"allocate" : {
"number_of_replicas" : 0,
"include" : { },
"exclude" : { },
"require" : { }
},
"searchable_snapshot" : {
"snapshot_repository" : "found-snapshots",
"force_merge_index" : true
},
"set_priority" : {
"priority" : 0
}
}
}
}
}
```
</details>
2. Create index template with 3 shards:
<details> <summary> Index Template </summary>
```
PUT _index_template/template1
{
"index_patterns": ["test*"],
"template": {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0,
"index.lifecycle.name": "move_hot_cold_ss",
"index.lifecycle.rollover_alias": "testkibana"
}
}
}
```
</details>
3. Bootstrap the index.
4. Observe ILM process runs smoothly
5. Cause a red index status (for my repro I paused one of my hot nodes to cause this).
6. Wait for ILM process to fail on move to cold phase because index is missing primary shards.
7. Recover cluster to green status (I resumed my hot node).
8. Observe how ILM no longer deletes searchable snapshot indices.
9. Observe how ILM process continues rolling over indices, creating searchable snapshots, and not deleting the searchable snapshots.
10. Observe how the ilm-history index doesn't show any errors or failures (besides the one where the primary shards were gone).
11. ILM history doesn't provide any errors in the running phases.
<details> <summary> screengrab of ilm history </summary>
Correct ILM behavior before triggering issue:
<img width="1450" alt="Screen Shot 2021-08-26 at 19 27 07" src="https://user-images.githubusercontent.com/62263912/131052704-14217ec4-809e-4c9e-92bd-74476e745157.png">
After triggering issue, note no delete step:
<img width="1444" alt="Screen Shot 2021-08-26 at 19 28 28" src="https://user-images.githubusercontent.com/62263912/131052795-b359a54f-e2e3-4fc5-8772-afa76039a0c5.png">
</details>
12. Observe how any GET ilm explains don't tell you any errors, for example:
<details> <summary> get ilm explains for index and searchable snapshot index </summary>
```
{
"indices" : {
"restored-testkibana_sample-000043" : {
"index" : "restored-testkibana_sample-000043",
"managed" : true,
"policy" : "move_hot_cold_ss",
"lifecycle_date_millis" : 1630022898837,
"age" : "3.73m",
"phase" : "cold",
"phase_time_millis" : 1630022960237,
"action" : "allocate",
"action_time_millis" : 1630022963864,
"step" : "allocate",
"step_time_millis" : 1630023086716,
"repository_name" : "found-snapshots",
"snapshot_name" : "2021.08.27-testkibana_sample-000043-move_hot_cold_ss-vyjwezscr2uq3qwuls0gfg",
"phase_execution" : {
"policy" : "move_hot_cold_ss",
"phase_definition" : {
"min_age" : "1m",
"actions" : {
"allocate" : {
"number_of_replicas" : 0,
"include" : { },
"exclude" : { },
"require" : { }
},
"searchable_snapshot" : {
"snapshot_repository" : "found-snapshots",
"force_merge_index" : true
},
"set_priority" : {
"priority" : 0
}
}
},
"version" : 3,
"modified_date_in_millis" : 1630021406861
}
}
}
}
```
</details>
**Things I tried to recover from the issue**
1. Tried stopping, starting ILM
2. Tried deleting snapshots and indices where the issue had occurred.
3. A friend suggested forcing the stuck indices to the ILM next step, but that is not scalable when the issue has been going on for a while and there is a whole lot of indices with the problem.
**Interesting observation**
If you add the same policy to a different index pattern, the same issue happens, however a new ilm policy doesn't get affected (or other existing policies). So it seems this affects only the policy associated to the index with the failures.
Seems the only available workaround is to associate the new indices to a new policy of the same type through the index template.