ILM hot/warm policy is not moving shards

Hi

I have created an ILM policy and template, and I create daily indices with Logstash on an ES cluster 7.10. I have some fast and expensive nodes as hot (and other types) and a few slow and inexpensive as warm only. I expected that my ILM policy would move the in the warm phase to the warm nodes. But this is not happening (apparently I do something wrong).

Nodes

Some nodes are dhilrst (ie h hot included) and a some are w (warm) only.

$ curl localhost:9200/_cat/nodes?v
ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
...
10.40.21.141           34          96   6    0.67    0.56     0.66 dhilrst   -      ip-10-40-21-141
10.40.21.136           43          92   1    0.07    0.02     0.03 dhilrst   -      ip-10-40-21-136
10.40.23.24            43          94   6    0.18    0.30     0.41 dhilrst   -      ip-10-40-23-24
10.40.21.234           30          99   3    0.00    0.08     0.09 w         -      ip-10-40-21-234
10.40.23.37            56          99   4    0.02    0.05     0.08 w         -      ip-10-40-23-37
10.40.22.135           50          99   4    0.00    0.02     0.07 w         -      ip-10-40-22-135
...

ILM is ebabled

$ curl -s 'localhost:9200/_ilm/status' | jq .
{
  "operation_mode": "RUNNING"
}

No Cluster-level shard allocation and routing settings

$ curl -s localhost:9200/_cluster/settings | jq .
{
  "persistent": {
    "xpack": {
      "monitoring": {
        "elasticsearch": {
          "collection": {
            "enabled": "false"
          }
        }
      }
    }
  },
  "transient": {
    "cluster": {
      "routing": {
        "allocation": {
          "include": {
            "_ip": ""
          },
          "exclude": {
            "_ip": ""
          }
        }
      }
    }
  }
}

ILM policy and template

I expect that this policy does the following:

  • index age: 0ms - 1day => phase hot (index on hot node)
  • index age: 1 day - 21 days => phase warm (index on warm node)
  • index age: 21 days - inf => phase delete (index gets deleted)
$ curl -s localhost:9200/_ilm/policy/foo?pretty  | jq .foo
{
  "version": 5,
  "modified_date": "2021-04-13T15:29:32.797Z",
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {}
      },
      "delete": {
        "min_age": "21d",
        "actions": {
          "delete": {
            "delete_searchable_snapshot": true
          }
        }
      },
      "warm": {
        "min_age": "1d",
        "actions": {
          "migrate": {
            "enabled": true
          }
        }
      }
    }
  }
}

Index template

$ curl -s 'localhost:9200/_index_template'  | jq '.index_templates[] | select(.name=="foo")'
{
  "name": "foo",
  "index_template": {
    "index_patterns": [
      "foo.logstash-*"
    ],
    "template": {
      "settings": {
        "index": {
          "lifecycle": {
            "name": "foo"
          }
        }
      }
    },
    "composed_of": []
  }
}

"today" (2021.05.10) index

Today's index is data_content (by default) so it does get assigned to the content node.

$ curl -s 'localhost:9200/foo.logstash-2021.05.10' | jq '."foo.logstash-2021.05.10".settings'
{
  "index": {
    "lifecycle": {
      "name": "foo"
    },
    "routing": {
      "allocation": {
        "include": {
          "_tier_preference": "data_content"
        }
      }
    },
    "number_of_shards": "1",
    "provided_name": "foo.logstash-2021.05.10",
    "creation_date": "1620604801015",
    "number_of_replicas": "1",
    "uuid": "J9uINSGSSuWjO60xgO_PRA",
    "version": {
      "created": "7100199"
    }
  }
}

Indeed, the index's shards are on dhilrst nodes.

ubuntu@logs-live-master-us-west-2a:~$ curl -s 'localhost:9200/_cat/shards' | grep foo.logstash-2021.05.10
foo.logstash-2021.05.10          0 p STARTED  30069059     38gb 10.40.21.141 ip-10-40-21-141
foo.logstash-2021.05.10          0 r STARTED  30037707   36.4gb 10.40.23.24  ip-10-40-23-24

ILM-wise the index is on its hot phase, without any errors

$ curl -s 'localhost:9200/foo.logstash-2021.05.10/_ilm/explain?pretty'
{
  "indices" : {
    "foo.logstash-2021.05.10" : {
      "index" : "foo.logstash-2021.05.10",
      "managed" : true,
      "policy" : "foo",
      "lifecycle_date_millis" : 1620604801015,
      "age" : "16.75h",
      "phase" : "hot",
      "phase_time_millis" : 1620604828556,
      "action" : "complete",
      "action_time_millis" : 1620604826702,
      "step" : "complete",
      "step_time_millis" : 1620604828556,
      "phase_execution" : {
        "policy" : "foo",
        "phase_definition" : {
          "min_age" : "0ms",
          "actions" : { }
        },
        "version" : 5,
        "modified_date_in_millis" : 1618327772797
      }
    }
  }
}

yesrerday's (2021.05.09) index (where the problem is shown)

Yesterday's index is data_warm,data_hot.

$ curl -s 'localhost:9200/foo.logstash-2021.05.09' | jq '."foo.logstash-2021.05.09".settings'
{
  "index": {
    "lifecycle": {
      "name": "foo"
    },
    "routing": {
      "allocation": {
        "include": {
          "_tier_preference": "data_warm,data_hot"
        }
      }
    },
    "number_of_shards": "1",
    "provided_name": "foo.logstash-2021.05.09",
    "creation_date": "1620518400615",
    "number_of_replicas": "1",
    "uuid": "SSHXGJZmS52jC1gpakOIyw",
    "version": {
      "created": "7100199"
    }
  }
}

I'd expect the index to be on a warm node, BUT it is not.... it's on dhilrst nodes. This is where the problem is shown.

$ curl -s 'localhost:9200/_cat/shards' | grep foo.logstash-2021.05.09
foo.logstash-2021.05.09          0 p STARTED  40757640   48.2gb 10.40.23.24  ip-10-40-23-24
foo.logstash-2021.05.09          0 r STARTED  40757640   48.2gb 10.40.21.136 ip-10-40-21-136

ILM-wise the index is in its warm phase without any errors

$ curl -s 'localhost:9200/foo.logstash-2021.05.09/_ilm/explain?pretty'
{
  "indices" : {
    "foo.logstash-2021.05.09" : {
      "index" : "foo.logstash-2021.05.09",
      "managed" : true,
      "policy" : "foo",
      "lifecycle_date_millis" : 1620518400615,
      "age" : "1.69d",
      "phase" : "warm",
      "phase_time_millis" : 1620604818008,
      "action" : "complete",
      "action_time_millis" : 1620604829692,
      "step" : "complete",
      "step_time_millis" : 1620604829692,
      "phase_execution" : {
        "policy" : "foo",
        "phase_definition" : {
          "min_age" : "1d",
          "actions" : {
            "migrate" : {
              "enabled" : true
            }
          }
        },
        "version" : 5,
        "modified_date_in_millis" : 1618327772797
      }
    }
  }
}

Replicate settings on a docker test cluster

I have replicated the settings on a docker ES 7.10 cluster and the shards move to the warm node as expected.

Thanks for using Elasticsearch.

The shards are not moving to the warm tier becuase your hot tier also contains the generic data role. The data role can host any specialised data tier roles (so your dhilrst nodes can host the warm tier too due to the d role).

You'd have to remove the generic data role from your node roles configurations for the hot tier to only host hot and content data.

1 Like

@andreidan thanks for the prompt reply.

I understand you suggest that we remove the generic data role from all the nodes of the cluster. I would like to ask how will this affect indices that are not managed with the data tiers (warm/hot), both new and existing indices.

Let me give some examples:

index created AFTER we enabled ILM policies:

Indices are created automatically from Logstash. In this example the index has the default (?) "_tier_preference": "data_content"

$ curl -s 'localhost:9200/lala-2021.w06/_ilm/explain?pretty'
{
  "indices" : {
    "lala-2021.w06" : {
      "index" : "lala-2021.w06",
      "managed" : false
    }
  }
}

$ curl -s 'localhost:9200/lala-2021.w06/' | jq '."lala-2021.w06".settings'
{
  "index": {
    "routing": {
      "allocation": {
        "include": {
          "_tier_preference": "data_content"
        }
      }
    },
    "number_of_shards": "1",
    "provided_name": "lala-2021.w06",
    "creation_date": "1612742422237",
    "number_of_replicas": "1",
    "uuid": "R804zhniR7SMo7_QAePH0g",
    "version": {
      "created": "7100199"
    }
  }
}

index created BEFORE we enabled ILM policies:

We have indices that were created before the introduction of ILM. In this example, the index has no routing allocation at all

$ curl -s 'localhost:9200/lala-2020.w40/_ilm/explain?pretty'
{
  "indices" : {
    "lala-2020.w40" : {
      "index" : "lala-2020.w40",
      "managed" : false
    }
  }
}

$ curl -s 'localhost:9200/lala-2020.w40/' | jq '."lala-2020.w40".settings'
{
  "index": {
    "creation_date": "1601856015556",
    "number_of_shards": "1",
    "number_of_replicas": "1",
    "uuid": "UsUCSLCFSomidJ2WnSa83Q",
    "version": {
      "created": "7070199",
      "upgraded": "7100199"
    },
    "provided_name": "lala-2020.w40"
  }
}

This is due to the automatic new index allocation Data tiers | Elasticsearch Guide [8.11] | Elastic (irrespective of being managed by ILM)

the index has no routing allocation at all

Indices without any allocation rules will be able to reside on any data holding-capable node (ie. all data tier specific node roles can hold data).

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.