Nodes are bootlooping when attempting to start cluster

BryanPerkins · July 2, 2018, 8:37pm

This issue cropped up over the weekend. I'm running a 4 instance setup of ECE and when I try to spin up a cluster on 3 of these instances, I get the following error:

Unexpected error during step: [forced-cluster-reboot]: [no.found.constructor.steps.waiting.ServerBootloopingException: Instance is bootlooping [ElasticsearchInstance(ElasticsearchCluster(b1f1c59858344ab0a38f223013a578ee),instance-0000000006)]]

I'm not sure as to the reason of this as it worked just 2 days ago, but it's preventing me from doing anything at all to the cluster.

Yuri · July 3, 2018, 9:22am

Hi Brian,
Although there can be many reasons and I would suggest you to check the cluster's logs to find out the root cause (you can find logs in logging-and-metrics cluster), usually it happens because one of the following reasons:

insufficient memory allocated to a node.
insufficient disk quota.

To solve these two issues, this API call can be useful

curl -u root -X PUT \
  'https://$COORDINATOR_HOST:12443/api/v1/clusters/elasticsearch/$CLUSTER_ID/instances/$COMMA_SEPARATED_BOOTLOOPING_INSTANCES_IDS/settings?restart_after_update=true' \
  -d '{
  "instance_capacity": 8192
}'

The command above overrides memory quota for a particular instance (or instances). But it does not change cluster plan. It means that after you apply a plan to the cluster, the settings will be gone.

After the cluster starts and gets synced, I recommend increasing memory quota for the cluster by changing its capacity via UI.

BryanPerkins · July 3, 2018, 4:16pm

As it turns out the issue was that every instance of ECE had run out of storage on the root filesystem, despite there being hundreds of gigabytes remaining in /mnt/data. It caused the error i mentioned in this thread along with many others, even after I expanded the root storage. I ended up having to reinstall ECE because a cluster would not change its configuration, start up, or be deleted.

system · July 17, 2018, 4:16pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ECE XFS Quota Question Elastic Cloud Enterprise (ECE)	18	2688	February 7, 2019
500: An internal server error Elastic Cloud Enterprise (ECE)	14	11667	November 14, 2017
Failed to detect running cluster - instance was not detected as running in time Elastic Cloud Enterprise (ECE)	1	934	November 18, 2020
ECE Fundamentals: "Not enough capacity to allocate instance(s)" Elastic Training	5	662	August 19, 2022
Increased memory not detected by Allocator Elastic Cloud Enterprise (ECE)	2	1527	July 12, 2017

Nodes are bootlooping when attempting to start cluster

Related Topics