Nodes are bootlooping when attempting to start cluster

This issue cropped up over the weekend. I'm running a 4 instance setup of ECE and when I try to spin up a cluster on 3 of these instances, I get the following error:

Unexpected error during step: [forced-cluster-reboot]: [no.found.constructor.steps.waiting.ServerBootloopingException: Instance is bootlooping [ElasticsearchInstance(ElasticsearchCluster(b1f1c59858344ab0a38f223013a578ee),instance-0000000006)]]

I'm not sure as to the reason of this as it worked just 2 days ago, but it's preventing me from doing anything at all to the cluster.

Hi Brian,
Although there can be many reasons and I would suggest you to check the cluster's logs to find out the root cause (you can find logs in logging-and-metrics cluster), usually it happens because one of the following reasons:

  • insufficient memory allocated to a node.
  • insufficient disk quota.

To solve these two issues, this API call can be useful

curl -u root -X PUT \
  'https://$COORDINATOR_HOST:12443/api/v1/clusters/elasticsearch/$CLUSTER_ID/instances/$COMMA_SEPARATED_BOOTLOOPING_INSTANCES_IDS/settings?restart_after_update=true' \
  -d '{
  "instance_capacity": 8192

The command above overrides memory quota for a particular instance (or instances). But it does not change cluster plan. It means that after you apply a plan to the cluster, the settings will be gone.

After the cluster starts and gets synced, I recommend increasing memory quota for the cluster by changing its capacity via UI.

1 Like

As it turns out the issue was that every instance of ECE had run out of storage on the root filesystem, despite there being hundreds of gigabytes remaining in /mnt/data. It caused the error i mentioned in this thread along with many others, even after I expanded the root storage. I ended up having to reinstall ECE because a cluster would not change its configuration, start up, or be deleted.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.