Error when creating cluster of 32GB


(weibin.wu) #1

Hi

I was creating a cluster with 3 AZ, 1 node per AZ, 32GB per node.
Then I got this error during creation;

#1
Jun 15, 2017, 3:51:36 PM GMT (50 seconds ago)
Show details
Unexpected error during step: [wait-until-running]: [no.found.constructor.models.ForcedAbort: Plan aborted by deletion of pending plan]
#0
Jun 15, 2017, 3:51:36 PM GMT (50 seconds ago)
Show details
Unexpected error during step: [allocate-instances]: [no.found.constructor.steps.allocation.AllocationFailedException: Allocation failed: [CoordinationFailed(org.apache.zookeeper.KeeperMultiException: KeeperErrorCode = BadVersion with [CHECK(/, null)), SET_DATA(/clusters/c38a2e8b055f48749ad3b19796cc123f/instances/instance-0000000011, null)), SET_DATA(/clusters/c38a2e8b055f48749ad3b19796cc123f/instances/instance-0000000007, null)), SET_DATA(/clusters/c38a2e8b055f48749ad3b19796cc123f/instances/instance-0000000006, null)), SET_DATA(/clusters/c38a2e8b055f48749ad3b19796cc123f/instances/instance-0000000009, null)), SET_DATA(/clusters/c38a2e8b055f48749ad3b19796cc123f/instances/instance-0000000010, null)), SET_DATA(/clusters/c38a2e8b055f48749ad3b19796cc123f/instances/instance-0000000008, null)), SET_DATA(/services/allocators/ece-region-1b-1/172.31.38.171/instances, null)), SET_DATA(/services/allocators/ece-region-1a-1/172.31.56.91/instances, null)), SET_DATA(/services/allocators/ece-region-1a-2/172.31.63.51/instances, null))])]]

The error message is actually a bit difficult to read. What could be the cause if i see this message.

The underlying EC2 will be 4 x m4.10xlarge.


(Alex Piggott) #2

Hi @weibin.wu

Can you run the following APIs call for the cluster, which will return more info that will be very helpful:

curl 'http://readonly:READONLY_PASSWORD@URL:12400/api/v1/clusters/elasticsearch/CLUSTERID?show_plan_logs=true&show_metadata=true&show_plans=true'

and:

curl 'http://readonly:READONLY_PASSWORD@URL:12400/api/v1/clusters/elasticsearch/CLUSTERID/plan/activity?show_plan_logs=true'

(see https://www.elastic.co/guide/en/cloud-enterprise/current/get-es-cluster.html and https://www.elastic.co/guide/en/cloud-enterprise/current/get-es-cluster-plan-activity.html)

(using readonly user means that any secrets get automatically expunged from the API response)

What it looks like to me is that the plan got cancelled (1) and the error under "0" is just the constructor step getting confused as the plan gets rolled back (2). I could definitely be wrong though

  1. being [no.found.constructor.models.ForcedAbort: Plan aborted by deletion of pending plan] - that normally means that the cancel button has been pressed

  2. being [no.found.constructor.steps.allocation.AllocationFailedException: Allocation failed - when instances are created, they register a "rollback" action if something goes wrong. and I believe the error is occurring there. It should mention the rollback above this step in the UI somewhere if my memory is right.

It's not completely unheard of for a rollback step to fail when a plan fails/is cancelled. Normally this means that some unused resources are still allocated to the cluster, but they get removed by eg a cluster shutdown so this isn't considered very serious.

(edit: for more details)


(system) #3

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.