Index replica shard relocation taking too long

A week ago, we had to recreate one out of 3 nodes of our ELK cluster due to and exception trying to stop the service (docker container). To solve it i had to kill the container process and then recreate the node. To let the node work with the same data directory i had to delete the lock files. I know that this practices are not recommended, but at the time were the only way i found to solve the issue.

After all this situation great majority of the shards web assigned, but some replica shards were not. One week later we keep on fighting with some of this replica shards. 70% of them are them are from indices with a size about 100GB. From time to time relocation fail and no reason is shown in "explanation".

Any advice?

Thanks in advanced.

Multiple issues cause the issue

  1. Check the disk size in the target
  2. Force Shard Allocation with /_cluster/reroute Cluster reroute API | Elasticsearch Guide [8.17] | Elastic
  3. Check the ongoing and completed shard recoveries with /_cat/recovery cat recovery API | Elasticsearch Guide [8.17] | Elastic .
    If you fine the corrupted shards and having a replica of the shard is safe to delete with make sure you read this Open index API | Elasticsearch Guide [8.17] | Elastic