Hi
I have created an ILM policy and template, and I create daily indices with Logstash on an ES cluster 7.10. I have some fast and expensive nodes as hot
(and other types) and a few slow and inexpensive as warm
only. I expected that my ILM policy would move the in the warm
phase to the warm
nodes. But this is not happening (apparently I do something wrong).
Nodes
Some nodes are dhilrst
(ie h
hot
included) and a some are w
(warm
) only.
$ curl localhost:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
...
10.40.21.141 34 96 6 0.67 0.56 0.66 dhilrst - ip-10-40-21-141
10.40.21.136 43 92 1 0.07 0.02 0.03 dhilrst - ip-10-40-21-136
10.40.23.24 43 94 6 0.18 0.30 0.41 dhilrst - ip-10-40-23-24
10.40.21.234 30 99 3 0.00 0.08 0.09 w - ip-10-40-21-234
10.40.23.37 56 99 4 0.02 0.05 0.08 w - ip-10-40-23-37
10.40.22.135 50 99 4 0.00 0.02 0.07 w - ip-10-40-22-135
...
ILM is ebabled
$ curl -s 'localhost:9200/_ilm/status' | jq .
{
"operation_mode": "RUNNING"
}
No Cluster-level shard allocation and routing settings
$ curl -s localhost:9200/_cluster/settings | jq .
{
"persistent": {
"xpack": {
"monitoring": {
"elasticsearch": {
"collection": {
"enabled": "false"
}
}
}
}
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"include": {
"_ip": ""
},
"exclude": {
"_ip": ""
}
}
}
}
}
}
ILM policy and template
I expect that this policy does the following:
- index age: 0ms - 1day => phase hot (index on
hot
node) - index age: 1 day - 21 days => phase warm (index on
warm
node) - index age: 21 days - inf => phase delete (index gets deleted)
$ curl -s localhost:9200/_ilm/policy/foo?pretty | jq .foo
{
"version": 5,
"modified_date": "2021-04-13T15:29:32.797Z",
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {}
},
"delete": {
"min_age": "21d",
"actions": {
"delete": {
"delete_searchable_snapshot": true
}
}
},
"warm": {
"min_age": "1d",
"actions": {
"migrate": {
"enabled": true
}
}
}
}
}
}
Index template
$ curl -s 'localhost:9200/_index_template' | jq '.index_templates[] | select(.name=="foo")'
{
"name": "foo",
"index_template": {
"index_patterns": [
"foo.logstash-*"
],
"template": {
"settings": {
"index": {
"lifecycle": {
"name": "foo"
}
}
}
},
"composed_of": []
}
}
"today" (2021.05.10) index
Today's index is data_content
(by default) so it does get assigned to the content
node.
$ curl -s 'localhost:9200/foo.logstash-2021.05.10' | jq '."foo.logstash-2021.05.10".settings'
{
"index": {
"lifecycle": {
"name": "foo"
},
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"provided_name": "foo.logstash-2021.05.10",
"creation_date": "1620604801015",
"number_of_replicas": "1",
"uuid": "J9uINSGSSuWjO60xgO_PRA",
"version": {
"created": "7100199"
}
}
}
Indeed, the index's shards are on dhilrst
nodes.
ubuntu@logs-live-master-us-west-2a:~$ curl -s 'localhost:9200/_cat/shards' | grep foo.logstash-2021.05.10
foo.logstash-2021.05.10 0 p STARTED 30069059 38gb 10.40.21.141 ip-10-40-21-141
foo.logstash-2021.05.10 0 r STARTED 30037707 36.4gb 10.40.23.24 ip-10-40-23-24
ILM-wise the index is on its hot
phase, without any errors
$ curl -s 'localhost:9200/foo.logstash-2021.05.10/_ilm/explain?pretty'
{
"indices" : {
"foo.logstash-2021.05.10" : {
"index" : "foo.logstash-2021.05.10",
"managed" : true,
"policy" : "foo",
"lifecycle_date_millis" : 1620604801015,
"age" : "16.75h",
"phase" : "hot",
"phase_time_millis" : 1620604828556,
"action" : "complete",
"action_time_millis" : 1620604826702,
"step" : "complete",
"step_time_millis" : 1620604828556,
"phase_execution" : {
"policy" : "foo",
"phase_definition" : {
"min_age" : "0ms",
"actions" : { }
},
"version" : 5,
"modified_date_in_millis" : 1618327772797
}
}
}
}
yesrerday's (2021.05.09) index (where the problem is shown)
Yesterday's index is data_warm,data_hot
.
$ curl -s 'localhost:9200/foo.logstash-2021.05.09' | jq '."foo.logstash-2021.05.09".settings'
{
"index": {
"lifecycle": {
"name": "foo"
},
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_warm,data_hot"
}
}
},
"number_of_shards": "1",
"provided_name": "foo.logstash-2021.05.09",
"creation_date": "1620518400615",
"number_of_replicas": "1",
"uuid": "SSHXGJZmS52jC1gpakOIyw",
"version": {
"created": "7100199"
}
}
}
I'd expect the index to be on a warm
node, BUT it is not.... it's on dhilrst
nodes. This is where the problem is shown.
$ curl -s 'localhost:9200/_cat/shards' | grep foo.logstash-2021.05.09
foo.logstash-2021.05.09 0 p STARTED 40757640 48.2gb 10.40.23.24 ip-10-40-23-24
foo.logstash-2021.05.09 0 r STARTED 40757640 48.2gb 10.40.21.136 ip-10-40-21-136
ILM-wise the index is in its warm
phase without any errors
$ curl -s 'localhost:9200/foo.logstash-2021.05.09/_ilm/explain?pretty'
{
"indices" : {
"foo.logstash-2021.05.09" : {
"index" : "foo.logstash-2021.05.09",
"managed" : true,
"policy" : "foo",
"lifecycle_date_millis" : 1620518400615,
"age" : "1.69d",
"phase" : "warm",
"phase_time_millis" : 1620604818008,
"action" : "complete",
"action_time_millis" : 1620604829692,
"step" : "complete",
"step_time_millis" : 1620604829692,
"phase_execution" : {
"policy" : "foo",
"phase_definition" : {
"min_age" : "1d",
"actions" : {
"migrate" : {
"enabled" : true
}
}
},
"version" : 5,
"modified_date_in_millis" : 1618327772797
}
}
}
}
Replicate settings on a docker test cluster
I have replicated the settings on a docker ES 7.10 cluster and the shards move to the warm
node as expected.