Failed to detect running cluster - instance was not detected as running in time

I have ECE 2.5.1 with over 30GB capacity spare over 3 zones. I am unable to create new clusters, the full message is:

Plan change failed: [ClusterFailure:InstanceDidNotStartWhileWaitingForRunning]: Failed to detect running cluster - instance was not detected as running in time. Check the health of the cluster, and look at the instance and/or allocator logs to determine if there were any issues starting. Details: [{"details":"The state did not become the desired one before [600000 milliseconds] elapsed. Last error was: [Instance is not running [instance-0000000001]. Please check allocator/docker logs.]"}]

I have tried to find more logs and wasn't sure if this was relevant:

allocator/logs/allocator.log:[2020-11-03 17:11:18,345][ERROR][no.found.runner.allocation.elasticsearch.ElasticsearchDockerContainerManager] Unexpected error during allocation {"ec_container_name":"instance-0000000000","cluster_type":"elasticsearch","ec_container_group":"3b011a14a88d4e919525dc8a974e43e2","cluster_id":"3b011a14a88d4e919525dc8a974e43e2","instance_id":"instance-0000000000","ec_container_kind":"elasticsearch"}

Checking the disk usage on the hosts, root volume was 95% full and seems to contain Docker related content. So I initially tried to reduce disk requirement by choosing a version of elasticsearch for my cluster that was already in use instead of the newest version. I seemed to get further (the logs above) with this approach that when I was trying the new version of Elasticsearch. With the new version, I got errors in the Elasticsearch and Kibana activities of ECE.

How can I diagnose this and fix ECE so I can create more clusters? Is the 95% root volume a problem even though the /mnt/data is only 1% full?


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.