The second instance complains:
Checking runner ip connectivity... FAILED
Can't connect $RUNNER_HOST_IP [10.5.3.195:22000]: Connection refused
[...]
Errors have caused Elastic Cloud Enterprise installation to fail
Some of the prerequisites failed: [runner ip connectivity], please fix before continuing
I checked on the first instance and there isn't anything listening on this port:
netstat -anpt|grep LISTEN|grep 22000
Any advice?
Possibly relevant context. I'm using r5.xlarge instances. I had to manually set up docker as the cloud-init script to do this failed.
Failed running /var/lib/cloud/scripts/per-instance/00-format-drives-enable-docker
Possibly because of the storage being /dev/nvme1n1
Instead I manually did parted mklabel, mkpart, mkfs.xfs, mkdir /mnt/data, install /mnt/data, mount, make the sysctl edits, systemctl restart docker...
I think what that error means is that a docker container running on the 2nd instance can't connect to its host on that port (which is in the range we use for container-container comms), it's nothing do with external connectivity.
I believe it does that check on every install, so your first instance passed. Is there any iptables / docker config difference between the two?
Thanks for the reply Alex.
All instances are created in an auto scaling group from identical config although they are in different availability zones and subnets. In this case the IP given in the error message is the IP of the first instance.
I've stepped back to a single instance install but would really like to get the medium example up and running - I will reproduce this.
I had to rebuild the ECE installation, and reproduced the problem.
The first instance was installed with
bash <(curl -fsSL https://download.elastic.co/cloud/elastic-cloud-enterprise.sh) install --availability-zone MY_ZONE-1 --memory-settings '{"runner":{"xms":"1G","xmx":"1G"},"allocator":{"xms":"4G","xmx":"4G"},"proxy":{"xms":"8G","xmx":"8G"},"zookeeper":{"xms":"4G","xmx":"4G"},"director":{"xms":"1G","xmx":"1G"},"constructor":{"xms":"4G","xmx":"4G"},"admin-console":{"xms":"4G","xmx":"4G"}}'
The second instance was installed with
bash <(curl -fsSL https://download.elastic.co/cloud/elastic-cloud-enterprise.sh) install --coordinator-host $HOST_IP --roles-token "$MY_TOKEN" --roles "director,coordinator,proxy,allocator" --availability-zone MY_ZONE-2 --memory-settings '{"runner":{"xms":"1G","xmx":"1G"},"allocator":{"xms":"4G","xmx":"4G"},"proxy":{"xms":"8G","xmx":"8G"},"zookeeper":{"xms":"4G","xmx":"4G"},"director":{"xms":"1G","xmx":"1G"},"constructor":{"xms":"4G","xmx":"4G"},"admin-console":{"xms":"4G","xmx":"4G"}}'
HOST_IP is the private IP address of the first instance which is routable from the second. MY_TOKEN is set as described in the documentation. Nothing seems to be listening on port 22000 on either instance.
You then come to install a second host connecting to the first, say $host2
The $host2 install fails on the internal connectivity check (that's what 22000 is), BUT claiming it's trying to connect to $host1.ip? And you're 100% sure (sorry for asking this stupid question!) that $host1 and $host2 have different IPs?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.