ECE medium installation example, AWS, runner ip connectivity

I'm installing the second instance (step 3 here: https://www.elastic.co/guide/en/cloud-enterprise/current/ece-topology-example2.html) after the first instance seemed to install without problems.

The second instance complains:
Checking runner ip connectivity... FAILED
Can't connect $RUNNER_HOST_IP [10.5.3.195:22000]: Connection refused
[...]

Errors have caused Elastic Cloud Enterprise installation to fail
Some of the prerequisites failed: [runner ip connectivity], please fix before continuing

I checked on the first instance and there isn't anything listening on this port:

netstat -anpt|grep LISTEN|grep 22000

Any advice?

Possibly relevant context. I'm using r5.xlarge instances. I had to manually set up docker as the cloud-init script to do this failed.
Failed running /var/lib/cloud/scripts/per-instance/00-format-drives-enable-docker
Possibly because of the storage being /dev/nvme1n1
Instead I manually did parted mklabel, mkpart, mkfs.xfs, mkdir /mnt/data, install /mnt/data, mount, make the sysctl edits, systemctl restart docker...

Hi @mesiasc

I think what that error means is that a docker container running on the 2nd instance can't connect to its host on that port (which is in the range we use for container-container comms), it's nothing do with external connectivity.

I believe it does that check on every install, so your first instance passed. Is there any iptables / docker config difference between the two?

Thanks for the reply Alex.
All instances are created in an auto scaling group from identical config although they are in different availability zones and subnets. In this case the IP given in the error message is the IP of the first instance.
I've stepped back to a single instance install but would really like to get the medium example up and running - I will reproduce this.

Alex,

I had to rebuild the ECE installation, and reproduced the problem.

The first instance was installed with
bash <(curl -fsSL https://download.elastic.co/cloud/elastic-cloud-enterprise.sh) install --availability-zone MY_ZONE-1 --memory-settings '{"runner":{"xms":"1G","xmx":"1G"},"allocator":{"xms":"4G","xmx":"4G"},"proxy":{"xms":"8G","xmx":"8G"},"zookeeper":{"xms":"4G","xmx":"4G"},"director":{"xms":"1G","xmx":"1G"},"constructor":{"xms":"4G","xmx":"4G"},"admin-console":{"xms":"4G","xmx":"4G"}}'

The second instance was installed with
bash <(curl -fsSL https://download.elastic.co/cloud/elastic-cloud-enterprise.sh) install --coordinator-host $HOST_IP --roles-token "$MY_TOKEN" --roles "director,coordinator,proxy,allocator" --availability-zone MY_ZONE-2 --memory-settings '{"runner":{"xms":"1G","xmx":"1G"},"allocator":{"xms":"4G","xmx":"4G"},"proxy":{"xms":"8G","xmx":"8G"},"zookeeper":{"xms":"4G","xmx":"4G"},"director":{"xms":"1G","xmx":"1G"},"constructor":{"xms":"4G","xmx":"4G"},"admin-console":{"xms":"4G","xmx":"4G"}}'

HOST_IP is the private IP address of the first instance which is routable from the second. MY_TOKEN is set as described in the documentation. Nothing seems to be listening on port 22000 on either instance.

Any further ideas to diagnose this please?

Let me double check I understand this!

You happily install ECE on $host1

You then come to install a second host connecting to the first, say $host2

The $host2 install fails on the internal connectivity check (that's what 22000 is), BUT claiming it's trying to connect to $host1.ip? And you're 100% sure (sorry for asking this stupid question!) that $host1 and $host2 have different IPs?

So we use HOST_IP as an internal environment variable to pass around the IP of the host currently being installed, eg $host2 in the discussion above

It looks like you are maybe overriding it in your scripting! It should work if you either:

  • Unset it
  • Set it to $host2's IP
  • Use --host-ip in the install CLI

Alex,

thanks for persevering, I should have thought of that! I have changed my scripting to not use HOST_IP and it now works nicely. :slight_smile: