ILM Waiting for Allocation on Warm Node

Hi,

I'm using 3 nodes cluster and monthly indices with 6 primary and 1 replica shards.

es-node-1 Hot node
es-node-2 Warm node
es-node-3 Cold node

Using the ILM policy with no rollover and the monthly indices stuck at allocation action onto Warm node. The indices waiting about 4 days to allocate.

I tested this lifecycle way a month ago and it worked fine. But now the lifecycle is not working as expected. How can I resolve this issue?

ILM explain:

Blockquote
{
"indices" : {
"document-2019.10" : {
"index" : "document-2019.10",
"managed" : true,
"policy" : "grid-ilm-policy",
"lifecycle_date_millis" : 1571048211487,
"phase" : "warm",
"phase_time_millis" : 1573813256180,
"action" : "allocate",
"action_time_millis" : 1573813856327,
"step" : "check-allocation",
"step_time_millis" : 1573813856438,
"step_info" : {
"message" : "Waiting for [6] shards to be allocated to nodes matching the given filters",
"shards_left_to_allocate" : 6,
"all_shards_active" : true,
"actual_replicas" : 1
},
"phase_execution" : {
"policy" : "grid-ilm-policy",
"phase_definition" : {
"min_age" : "32d",
"actions" : {
"allocate" : {
"include" : { },
"exclude" : { },
"require" : {
"box_type" : "warm"
}
},
"set_priority" : {
"priority" : 50
},
"shrink" : {
"number_of_shards" : 1
}
}
},
"version" : 9,
"modified_date_in_millis" : 1571038556385
}
}
}
}

Index settings:

Blockquote
{
"document-2019.10" : {
"settings" : {
"index" : {
"lifecycle" : {
"name" : "grid-ilm-policy"
},
"routing" : {
"allocation" : {
"require" : {
"box_type" : "warm"
}
}
},
"number_of_shards" : "6",
"provided_name" : "document-2019.10",
"creation_date" : "1571048211487",
"priority" : "50",
"number_of_replicas" : "1",
"uuid" : "9YQB0XJ5SU2aT_L1ZLIWuw",
"version" : {
"created" : "7030299"
}
}
}
}
}

Use the cluster allocation explain API to find out why the indices are not relocating. Do you have enough disk space on the nodes that are to receive the shards? By default no shards can be allocated to a node once it gets more than 85% full.

Hi Christian,

Thanks for your reply. I used this API but it turns that there is no unnassigned shards to explain.

GET /_cluster/allocation/explain

{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[es-node-3][10.200.0.6:9300][cluster:monitor/allocation/explain]"
}
],
"type": "illegal_argument_exception",
"reason": "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
},
"status": 400
}

GET /_cat/allocation?v

shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
202 59.1gb 80.1gb 16.5gb 96.6gb 82 10.200.0.4 10.200.0.4 es-node-1
202 61gb 80.4gb 114.5gb 195gb 41 10.200.0.6 10.200.0.6 es-node-3
204 56.1gb 75.1gb 70.7gb 145.8gb 51 10.200.0.5 10.200.0.5 es-node-2

The disk percents on the nodes:
es-node-1 > %82
es-node-2 > %51
es-node-3 > %41

So the disk watermark(%85) is not issued on any of the nodes. I tested this ILM before with this configuration and it was worked fine but I'm waiting for days now.

Hello Anil,

Have you found a solution for your problem? I'm facing exactly the same issue on my ES Cloud Deployment. Indices are not rolling over, stating they're waiting for a node to become available.

Kind regards,

Eric V.

Hi ericv,

I'm using the policy to shrink indices to 1primary&1replica shard. So the primary shard assigning on the warm node but we need a second warm node to replica shard to assign. Elasticsearch does not permit the primary and replica shards to assign on same node.

So you have to add minimum 2 warm nodes to 1p&1r shard or have to store the shards with no replica with 1 primary shard. I solved my issue with adding a "warm" node to my cluster. Referring this you have to add one "cold" node too if you want to store the indices with 1 replica.

Thanks Anil for your response.

I already have 2 hot and 2 warm nodes so that wasn't the issue, and there was also more than enough disk space available.
My issue luckily got solved when I upgraded my deployment to 7.5.0 this morning.

Kind regards,

Eric V.

Same issue here, solved by upgrade to 7.5.0