ILM not working correctly or is stuck somewhere

Guncixx · June 16, 2022, 10:47am

We have some problem with ILM not working as intended, at least it seems like that. We have added cold node to the cluster and I have set up policy logs to default levels - 50Gb or 30days, as our hot instance allows that, then after 30 days it's being moved to cold.

{
  "logs" : {
    "version" : 25,
    "modified_date" : "2022-06-16T09:30:36.061Z",
    "policy" : {
      "phases" : {
        "hot" : {
          "min_age" : "0ms",
          "actions" : {
            "rollover" : {
              "max_primary_shard_size" : "50gb",
              "max_age" : "30d"
            },
            "set_priority" : {
              "priority" : 100
            }
          }
        },
        "cold" : {
          "min_age" : "30d",
          "actions" : {
            "allocate" : {
              "number_of_replicas" : 0,
              "include" : { },
              "exclude" : { },
              "require" : { }
            },
            "set_priority" : {
              "priority" : 50
            }
          }
        }
      }
    },

ILM kinda started to work and put all those old indices into migrate status but they are stuck in migrate state already for some time and looking at the disk IO does not seem that cluster is trying to write them into cold nodes all the time.

Action status
[.ds-logs-endpoint.events.network-default-2022.04.12-000012] lifecycle action [migrate] waiting for [1] shards to be moved to the [data_cold] tier (tier migration preference configuration is [data_cold, data_warm, data_hot])

{
  "indices" : {
    ".ds-logs-endpoint.events.network-default-2022.04.12-000012" : {
      "index" : ".ds-logs-endpoint.events.network-default-2022.04.12-000012",
      "managed" : true,
      "policy" : "logs",
      "index_creation_date_millis" : 1649738075698,
      "time_since_index_creation" : "65.25d",
      "lifecycle_date_millis" : 1649789075719,
      "age" : "64.66d",
      "phase" : "cold",
      "phase_time_millis" : 1655284574757,
      "action" : "migrate",
      "action_time_millis" : 1655284575359,
      "step" : "check-migration",
      "step_time_millis" : 1655284576160,
      "step_info" : {
        "message" : "[.ds-logs-endpoint.events.network-default-2022.04.12-000012] lifecycle action [migrate] waiting for [1] shards to be moved to the [data_cold] tier (tier migration preference configuration is [data_cold, data_warm, data_hot])",
        "shards_left_to_allocate" : 1,
        "all_shards_active" : true,
        "number_of_replicas" : 0
      },
      "phase_execution" : {
        "policy" : "logs",
        "phase_definition" : {
          "min_age" : "30d",
          "actions" : {
            "set_priority" : {
              "priority" : 50
            }
          }
        },
        "version" : 24,
        "modified_date_in_millis" : 1655107428064
      }
    }
  }
}

So we just can't understand if it's slowly copying data over or it's stuck somewhere. We have 2 hot nodes and one of them in the same location as cold node, so copying between those 2 must be fast.
There are also indices with status check rollover ready but they have not reached 30 days or 50Gb so I'm also wondering what it's trying to do and why status is rollover.

{
  "indices" : {
    ".ds-logs-nginx.error-default-2022.05.31-000003" : {
      "index" : ".ds-logs-nginx.error-default-2022.05.31-000003",
      "managed" : true,
      "policy" : "logs",
      "index_creation_date_millis" : 1653999275177,
      "time_since_index_creation" : "15.93d",
      "lifecycle_date_millis" : 1653999275177,
      "age" : "15.93d",
      "phase" : "hot",
      "phase_time_millis" : 1654888740663,
      "action" : "rollover",
      "action_time_millis" : 1654888742464,
      "step" : "check-rollover-ready",
      "step_time_millis" : 1654888742464,
      "phase_execution" : {
        "policy" : "logs",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : {
            "set_priority" : {
              "priority" : 100
            },
            "rollover" : {
              "max_primary_shard_size" : "50gb",
              "max_age" : "30d"
            }
          }
        },
        "version" : 25,
        "modified_date_in_millis" : 1655371836061
      }
    }
  }
}

system · July 14, 2022, 10:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ILM not working correctly Elasticsearch ilm-index-lifecycle-management	5	590	June 29, 2022
ILM Policy getting stuck here Elasticsearch ilm-index-lifecycle-management	2	396	July 31, 2023
ILM Waiting for Allocation on Warm Node Elasticsearch ilm-index-lifecycle-management	7	2928	January 2, 2020
ILM action - lifecycle action [migrate] waiting for [1] shards to be moved to the [data_cold] tier Elasticsearch docker , ilm-index-lifecycle-management	2	2515	June 29, 2021
ILM Hot, Warm, Cold not moving indexes Elasticsearch ilm-index-lifecycle-management	4	375	February 8, 2024

ILM not working correctly or is stuck somewhere

Related topics