Elasticsearch networking with Podman

Hello,

Summary: I'm having some errors creating an elasticsearch cluster across multiple servers.
Setup: 2 rhel8 ec2 instances running elasticsearch 8.5 in podman containers
docker-compose.yml on server A:

  es03:
    image: localhost/elasticsearch
    container_name: es03
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - esdata03:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    environment:
      - ELASTIC_PASSWORD=${password}
      - node.name=es03
      - node.roles = [master]
      - cluster.name=${CLUSTER_NAME}
      - cluster.initial_master_nodes=["es03"]
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es03/es03.key
      - xpack.security.http.ssl.certificate=certs/es03/es03.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es03/es03.key
      - xpack.security.transport.ssl.certificate=certs/es03/es03.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
      - network.host="0.0.0.0"
      - xpack.security.transport.ssl.verification_mode=certificate
      - discovery.seed_hosts=["<serverA ip>", "<serverB ip>", "es02", "es03"]
      - ingest.geoip.downloader.enabled=false

docker-compose.yml on serverB:

  es02:
    image: localhost/elasticsearch
    container_name: es02
    volumes:
      - certs:/usr/share/elasticsearch/config/certs
      - esdata02:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    environment:
      - ELASTIC_PASSWORD=${password}
      - node.name=es02
      - node.roles = [data]
      - cluster.name=${CLUSTER_NAME}
      - cluster.initial_master_nodes=["es03"]
      - bootstrap.memory_lock=true
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.key=certs/es02/es02.key
      - xpack.security.http.ssl.certificate=certs/es02/es02.crt
      - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.key=certs/es02/es02.key
      - xpack.security.transport.ssl.certificate=certs/es02/es02.crt
      - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
      - xpack.security.transport.ssl.verification_mode=certificate
      - xpack.license.self_generated.type=${LICENSE}
      - network.host=_site_
      - xpack.security.transport.ssl.verification_mode=certificate
      - discovery.seed_hosts=["<serverA ip>", "<serverB ip>", "es02", "es03"]
      - ingest.geoip.downloader.enabled=false

I can start up both nodes, but they do not discovery each other. When i visit /_cat/nodes?v for server A i see something like this, but each showing their internal ip addresses :

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
10.xx.x.xx           53          94  22    0.82    0.36     0.37 cdfhilmrstw *      es03

When i check the logs of server B I get this:

WARN", "message":"completed handshake with [{es03}{xxxxxx}{-xxxxxx}{es03}{10.xx.x.xx}{10.xx.x.xx:9300}{cdfhilmrstw}] at [<serverA ip>:9300] but followup connection to [10.xx.x.xx:9300] failed"

I've tried:

  • changing network.host in docker-compose.yml to eth0, "0.0.0.0" and <server A/B ip>
  • As a test, running both in separate containers on the same server which worked, but the ip addresses listed at /_cat/node?v were internal ip's which seems wrong

Any ideas what could be causing this issue and how I can resolve it would be much appreciated.

Thank you

The right answer is usually to specify the exact public IP address for network.host - all other possibilities involve some heuristics that don't tend to work well with Docker. There's more info here in the reference manual.

For the sake of completeness there's also some guidance on discovery troubleshooting but it sounds like you've already done all that.

Thank you for getting back to me. I have tried to define the public IP in network.host however I always get this warning:

{"@timestamp":"2024-07-12T14:20:31.876Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"es03","elasticsearch.cluster.name":"docker-cluster","error.type":"org.elasticsearch.transport.BindTransportException","error.message":"Failed to bind to xxx.xxx.xxx:[9300-9399]","error.stack_trace":"org.elasticsearch.transport.BindTransportException: Failed to bind to xxx.xxx.xxx:[9300-9399]\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.transport.TcpTransport.bindToPort(TcpTransport.java:505)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.transport.TcpTransport.bindServer(TcpTransport.java:466)\n\tat org.elasticsearch.transport.netty4@8.5.2/org.elasticsearch.transport.netty4.Netty4Transport.doStart(Netty4Transport.java:142)\n\tat org.elasticsearch.security@8.5.2/org.elasticsearch.xpack.core.security.transport.netty4.SecurityNetty4Transport.doStart(SecurityNetty4Transport.java:96)\n\tat org.elasticsearch.security@8.5.2/org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4ServerTransport.doStart(SecurityNetty4ServerTransport.java:59)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:43)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.transport.TransportService.doStart(TransportService.java:311)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:43)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.node.Node.start(Node.java:1296)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.bootstrap.Elasticsearch.start(Elasticsearch.java:436)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:229)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.net.BindException: Cannot assign requested address\n\tat java.base/sun.nio.ch.Net.bind0(Native Method)\n\tat java.base/sun.nio.ch.Net.bind(Net.java:555)\n\tat java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:344)\n\tat java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:301)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannel.bind(AbstractChannel.java:260)\n\tat io.netty.transport@4.1.77.Final/io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)\n\tat io.netty.common@4.1.77.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log

I think that's covered by the docs I already linked, but TLDR the simplest answer is probably to use Docker's host networking mode instead of bridge (noting that the Docker docs say that bridge needs extra work when routing across hosts). Or you could set network.bind_host if for some reason you really need bridge networking.

Using host instead of bridge, along with setting network.host= worked. Thank you for your assistance.