While resizing a three node cluster it doesn't show as completed action due to timeout when stopping nodes.
Change memory per node from 4 GB to 8 GB
Timeout for configuration change: 5h 33m
Unexpected error during step: [wait-until-stopped]: [no.found.constructor.models.TimeoutException: Timeout]
This looks like a Docker related issue. Which version of ECE are you running?
In addition, can you please do the following:
Fetch all the logs from the relevant allocator hosts and post them here. Use this command to zip them on each host: docker ps -a > /mnt/data/elastic/docker.out && tar czvf ece-logs.tgz $(find /mnt/data/elastic -name "*.log" -o -name "*.out")
Try to resume the stopped instances, and then stop them manually.
If that doesn't help, kill the old 4GB containers in each host (the relevant containers name are of the form fac-{cluster-id}-{instance-id}.
If the plan is not running (failed because of a timeout), try to resubmit it so it cleans up the old instances.
In any case, would be good to see those logs so we can further investigate.
I resumed the instances in the admin console and then stopped them in the admin console. Seems one of the allocators wasn't able to stop the container.
2637b394b726 docker.elastic.co/cloud-enterprise/elasticsearch:5.3.0-1 "/sbin/entry-point" 4 days ago Restarting (1) 30 hours ago 0.0.0.0:18438->18438/tcp, 0.0.0.0:19296->19296/tcp fac-d2d1c12716a04716baadc3e6795d2ecc-instance-0000000010
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.