Hi
The Health of my Index has turned yellow, seemingly because a replica shard cannot be allocated. This index has been created as part of the "shrink" action as defined in my ILM policy. The desired result is that the Index would have been shrunk and moved to the "warm" datanodes. I don't think that this can complete until all shards are assigned.
I'll post the output of _cluster/allocation/explain
, _cat/shards/<index>
, ILM policy, Index template and Elasticsearch logs below. Please let me know if you have any idea at all why this replica is now unassigned.
_cat/shards/shrink-filebeat-haproxy-production-2019.10.08-000001?v
index shard prirep state docs store ip node
shrink-filebeat-haproxy-production-2019.10.08-000001 0 p STARTED 132097605 90gb 10.0.16.212 es-dn-warm-3.core.ld5.phg.io
shrink-filebeat-haproxy-production-2019.10.08-000001 0 r STARTED 132097605 90.1gb 10.0.16.210 es-dn-warm-1.core.ld5.phg.io
shrink-filebeat-haproxy-production-2019.10.08-000001 0 r UNASSIGNED
_cluster/allocation/explain
Moved to comment as hit max body limit.
ILM policy:
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "30d",
"max_size": "90gb"
},
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"allocate": {
"include": {},
"exclude": {},
"require": {
"data": "warm"
}
},
"forcemerge": {
"max_num_segments": 1
},
"set_priority": {
"priority": 50
},
"shrink": {
"number_of_shards": 1
}
}
}
}
}
}
Index Template (some fields removed):
{
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "10000"
}
},
"refresh_interval": "5s",
"blocks": {
"write": "true"
},
"provided_name": "filebeat-haproxy-production-2019.10.08-000001",
"query": {
...
}
"creation_date": "1570537372676",
"priority": "50",
"number_of_replicas": "2",
"uuid": "***",
"version": {
"created": "7030099"
},
"lifecycle": {
"name": "filebeat-haproxy-production-ilm-policy",
"rollover_alias": "filebeat-haproxy-production-ilm-alias",
"indexing_complete": "true"
},
"codec": "best_compression",
"routing": {
"allocation": {
"require": {
"data": "warm",
"_id": "***"
}
}
},
"number_of_shards": "3",
"shard": {
"check_on_startup": "checksum"
}
}
},
Elasticsearch Logs (From datanode which is failing to allocated replica)
Moved to comment as hit max body limit.
Things that I've tried:
- Ran
POST /_cluster/reroute?retry_failed=true
to try and retry the shard allocation. Shard turns to INITIALIZATION state then moves back to UNASSIGNED after a short period of time. Above Elasticsearch log error is noticed once INITIALIZATION has failed. - Set
cluster.routing.allocation.enable": "none"
. Tried to manually allocate the replica using the/_cluster/reroute
API. Then renabled shard allocation. Same failure.
Please let me know if you need any further logs/information