Ilm rollover stuck in warm phase

I have tested a simple hot/delete index rollover with success. Now I am trying to add a warm node and to try a hot/warm/delete rollover pattern, but am not able to get the warm indices to move to the delete phase.

My policy:

PUT _ilm/policy/hot-warm-delete
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size":"50gb",
            "max_age":"1m"
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "warm": {
        "min_age": "1m",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "shrink": {
            "number_of_shards": 1
          },
          "allocate": {
            "require": {
              "data": "warm"
            }
          },
          "set_priority": {
            "priority": 25
          }
        }
      },
      "delete": {
        "min_age": "5m",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

My index template:

PUT _template/hot-warm-delete-temp
{
  "index_patterns": ["hot-warm-delete-*"], 
  "settings": {
    "index.lifecycle.name": "hot-warm-delete", 
    "index.lifecycle.rollover_alias": "hot-warm-delete-alias" 
  }
}

Creating the first index (bootstrapping):

PUT hot-warm-delete-001 
{
  "aliases": {
    "hot-warm-delete-alias":{
      "is_write_index": true 
    }
  }
} 

Query: GET _cat/nodes?v

ip              heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.137.58            38          45   1    0.20    0.07     0.02 dilm      *      node-1
192.168.137.197           33          33   0    0.00    0.00     0.00 dilm      -      node-2

*note: node-1 (node.attr.box_type: "hot"), node-2 (node.attr.box_type: "warm") set in their respective elasticsearch.yml files

Query: GET /*/_ilm/explain?filter_path=indices.*.step*

{
  "indices" : {
    "filebeat-7.6.0-2020.04.10-000001" : {
      "step" : "check-rollover-ready",
      "step_time_millis" : 1586562150701
    },
    "hot-warm-delete-000006" : {
      "step" : "complete",
      "step_time_millis" : 1587070237126
    },
    "hot-warm-delete-000007" : {
      "step" : "wait-for-follow-shard-tasks",
      "step_time_millis" : 1587070237257
    },
    "hot-warm-delete-001" : {
      "step" : "check-allocation",
      "step_time_millis" : 1587064836108,
      "step_info" : {
        "message" : "Waiting for [1] shards to be allocated to nodes matching the given filters",
        "shards_left_to_allocate" : 1,
        "all_shards_active" : true,
        "actual_replicas" : 1
      }
    },
    "hot-warm-delete-000002" : {
      "step" : "check-allocation",
      "step_time_millis" : 1587066637855,
      "step_info" : {
        "message" : "Waiting for [1] shards to be allocated to nodes matching the given filters",
        "shards_left_to_allocate" : 1,
        "all_shards_active" : true,
        "actual_replicas" : 1
      }
    },
    "hot-warm-delete-000003" : {
      "step" : "check-allocation",
      "step_time_millis" : 1587067838272,
      "step_info" : {
        "message" : "Waiting for [1] shards to be allocated to nodes matching the given filters",
        "shards_left_to_allocate" : 1,
        "all_shards_active" : true,
        "actual_replicas" : 1
      }
    },
    "hot-warm-delete-000004" : {
      "step" : "check-allocation",
      "step_time_millis" : 1587069037791,
      "step_info" : {
        "message" : "Waiting for [1] shards to be allocated to nodes matching the given filters",
        "shards_left_to_allocate" : 1,
        "all_shards_active" : true,
        "actual_replicas" : 1
      }
    },
    "hot-warm-delete-000005" : {
      "step" : "check-allocation",
      "step_time_millis" : 1587070237374,
      "step_info" : {
        "message" : "Waiting for [1] shards to be allocated to nodes matching the given filters",
        "shards_left_to_allocate" : 1,
        "all_shards_active" : true,
        "actual_replicas" : 1
      }
    },
    "ilm-history-1-000001" : {
      "step" : "check-rollover-ready",
      "step_time_millis" : 1586562150643
    }
  }
}

Not entirely sure how to troubleshoot the issue. Any help greatly appreciated. Hopefully this is enough to get an idea of where I am at, but of course please ask for more information to help diagnose the problem.

Figured out what was going on. Since I only started 2 nodes and I left replicas defaulted to 1, the replica was saved to my warm node. When the ilm policy tried to move to the delete phase, the replica on the warm node caused the rollover to stall out.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.