Waiting for nodes to stop - timeout

(Mats Iremark) #1

While resizing a three node cluster it doesn't show as completed action due to timeout when stopping nodes.

Change memory per node from 4 GB to 8 GB
Timeout for configuration change: 5h 33m
Unexpected error during step: [wait-until-stopped]: [no.found.constructor.models.TimeoutException: Timeout]

Starting step: [wait-until-stopped]:
instances=List(ElasticsearchInstance(ElasticsearchCluster(d2d1c12716a04716baadc3e6795d2ecc),instance-0000000011), ElasticsearchInstance(ElasticsearchCluster(d2d1c12716a04716baadc3e6795d2ecc),instance-0000000010), ElasticsearchInstance(ElasticsearchCluster(d2d1c12716a04716baadc3e6795d2ecc),instance-0000000009))

The new nodes show up perfectly fine and it works to index data and search.

What should I do here? Kill the docker containers manually?

(Uri Cohen) #2

Hi @iremmats

This looks like a Docker related issue. Which version of ECE are you running?
In addition, can you please do the following:

  1. Fetch all the logs from the relevant allocator hosts and post them here. Use this command to zip them on each host:
    docker ps -a > /mnt/data/elastic/docker.out && tar czvf ece-logs.tgz $(find /mnt/data/elastic -name "*.log" -o -name "*.out")
  2. Try to resume the stopped instances, and then stop them manually.
  3. If that doesn't help, kill the old 4GB containers in each host (the relevant containers name are of the form fac-{cluster-id}-{instance-id}.
  4. If the plan is not running (failed because of a timeout), try to resubmit it so it cleans up the old instances.

In any case, would be good to see those logs so we can further investigate.


(Mats Iremark) #3

We are running Beta2.

  1. I got the logs. They are linked to my dropbox.

  2. I resumed the instances in the admin console and then stopped them in the admin console. Seems one of the allocators wasn't able to stop the container.

  3. Both stop and kill the container did not work. I found a few similar docker-related issues. https://github.com/moby/moby/issues/208

I also tried attaching to the container but nothing happens.

From our perspective as potential ECE customers the docker handling and docker itself is super important to work flawlessly.

(Mats Iremark) #4

2637b394b726 docker.elastic.co/cloud-enterprise/elasticsearch:5.3.0-1 "/sbin/entry-point" 4 days ago Restarting (1) 30 hours ago>18438/tcp,>19296/tcp fac-d2d1c12716a04716baadc3e6795d2ecc-instance-0000000010

It seems to be stuck in some restarting state.

(Mats Iremark) #5

A restart of the server solved the docker issue. Would be interesting to know what caused it though.

(system) #6

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.