Overriding `tcp.publish_port` breaks clustering when elasticsearch is in a container

I'm trying to run an elasticsearch cluster with each es-node running in its own container. These containers are deployed using ECS across several machines that may be running other unrelated containers. To avoid port conflicts each port a container exposes is assigned a random value. These random ports are consistent across all running containers of the same type. In other words, all running es-node containers map port 9300 to the same random number.

Here's the config I'm using:

network:
  host: 0.0.0.0

plugin:
  mandatory: cloud-aws

cluster:
  name: ${ES_CLUSTER_NAME}

discovery:
  type: ec2
  ec2:
    groups: ${ES_SECURITY_GROUP}
    any_group: false
  zen.ping.multicast.enabled: false

transport:
  tcp.port: 9300
  publish_port: ${_INSTANCE_PORT_TRANSPORT}

cloud.aws:
  access_key: ${AWS_ACCESS_KEY}
  secret_key: ${AWS_SECRET_KEY}
  region: ${AWS_REGION}

In this case _INSTANCE_PORT_TRANSPORT is the port that 9300 is bound to on the host machine. I've confirmed that all the environment variables used above are set correctly. I'm also setting network.publish_host to the host machine's local IP via a command line arg.

When I forced _INSTANCE_PORT_TRANSPORT (and in turn transport.publish_port) to be 9300, everything worked great, but as soon as it's given a random value, nodes can no longer connect to each other. I see errors like this using logger.discovery=TRACE:

ConnectTransportException[[][10.0.xxx.xxx:9300] connect_timeout[30s]]; nested: ConnectException[Connection refused: /10.0.xxx.xxx:9300];
	at org.elasticsearch.transport.netty.NettyTransport.connectToChannelsLight(NettyTransport.java:952)
	at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:916)
	at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:888)
	at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:267)
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:395)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

It seems like the port a node binds to is the same as the port it pings while trying to connect to other nodes. Is there any way to make them different? If not, what's the point of transport.publish_port?

Here are the full logs: https://gist.github.com/xavi-/6ecc4ba16b39680fb28c8fb25307bcc7

Hi @xavi,

I've seen that you've asked the same question on Stackoverfllow and I'm adding it just for reference.

Neither ES nor the AWS plugin are container-aware. So if you map the container port 9300 to the external port 9300 this just worked because both used the same port. If the mapping is different the typical solution is to use a dedicated cluster manager that "knows" the cluster topology. These blog article about Docker networking from our engineering team (and the Docker overlay networking mentioned in the article) should get you started.

Another solution could be the Elasticsearch Mesos framework (not maintained by Elastic).

Daniel

I don't think you understand the problem. In a clusters, there's a no guarantee that port 9300 (or any specific port) is available on a host machine at deploy time. Yes, one possible solution is to give each container an IP, but unfortunately that's not supported in amazon ECS. Instead the solution amazon uses is to map each port a container requests to a random host port.

So, for example, container port 9300 may get mapped to host port 10137. As a result you effectively end up with a ES process that can only receive TCP connections on port 10137, but boardcasts that other nodes should use port 9300 when connecting to it. This obviously broken. It doesn't sound like ES has a viable solution, which is regrettable.

Hi,

I got the problem but my point still holds: None of the plugins is container-aware and for a good reason. The standard practice is to use a dedicated cluster manager. The container should be treated as a jail and the software running inside the container shouldn't be aware of the host that it is running on (among others for security reasons).

This is true but you can also specify a port mapping, when you create a task (see ECS docs). So if you don't want to use a cluster manager, I'd suggest you go down that route.

I hope that helps.

Daniel