Hello,
Summary : I'm having some errors creating an elasticsearch cluster across multiple servers.
Setup : 2 rhel8 ec2 instances running elasticsearch 8.5 in podman containers
docker-compose.yml on server A :
es03:
image: localhost/elasticsearch
container_name: es03
volumes:
- certs:/usr/share/elasticsearch/config/certs
- esdata03:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
environment:
- ELASTIC_PASSWORD=${password}
- node.name=es03
- node.roles = [master]
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=["es03"]
- bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=certs/es03/es03.key
- xpack.security.http.ssl.certificate=certs/es03/es03.crt
- xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=certs/es03/es03.key
- xpack.security.transport.ssl.certificate=certs/es03/es03.crt
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.license.self_generated.type=${LICENSE}
- network.host="0.0.0.0"
- xpack.security.transport.ssl.verification_mode=certificate
- discovery.seed_hosts=["<serverA ip>", "<serverB ip>", "es02", "es03"]
- ingest.geoip.downloader.enabled=false
docker-compose.yml on serverB :
es02:
image: localhost/elasticsearch
container_name: es02
volumes:
- certs:/usr/share/elasticsearch/config/certs
- esdata02:/usr/share/elasticsearch/data
ports:
- 9200:9200
- 9300:9300
environment:
- ELASTIC_PASSWORD=${password}
- node.name=es02
- node.roles = [data]
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=["es03"]
- bootstrap.memory_lock=true
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.key=certs/es02/es02.key
- xpack.security.http.ssl.certificate=certs/es02/es02.crt
- xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.key=certs/es02/es02.key
- xpack.security.transport.ssl.certificate=certs/es02/es02.crt
- xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
- xpack.security.transport.ssl.verification_mode=certificate
- xpack.license.self_generated.type=${LICENSE}
- network.host=_site_
- xpack.security.transport.ssl.verification_mode=certificate
- discovery.seed_hosts=["<serverA ip>", "<serverB ip>", "es02", "es03"]
- ingest.geoip.downloader.enabled=false
I can start up both nodes, but they do not discovery each other. When i visit /_cat/nodes?v for server A i see something like this, but each showing their internal ip addresses :
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.xx.x.xx 53 94 22 0.82 0.36 0.37 cdfhilmrstw * es03
When i check the logs of server B I get this:
WARN", "message":"completed handshake with [{es03}{xxxxxx}{-xxxxxx}{es03}{10.xx.x.xx}{10.xx.x.xx:9300}{cdfhilmrstw}] at [<serverA ip>:9300] but followup connection to [10.xx.x.xx:9300] failed"
I've tried :
changing network.host in docker-compose.yml to eth0 , "0.0.0.0" and <server A/B ip>
As a test, running both in separate containers on the same server which worked, but the ip addresses listed at /_cat/node?v were internal ip's which seems wrong
Any ideas what could be causing this issue and how I can resolve it would be much appreciated.
Thank you
Mason_Keresty:
changing network.host in docker-compose.yml to eth0 , "0.0.0.0" and <server A/B ip>
The right answer is usually to specify the exact public IP address for network.host
- all other possibilities involve some heuristics that don't tend to work well with Docker. There's more info here in the reference manual .
For the sake of completeness there's also some guidance on discovery troubleshooting but it sounds like you've already done all that.
Thank you for getting back to me. I have tried to define the public IP in network.host however I always get this warning:
{"@timestamp":"2024-07-12T14:20:31.876Z", "log.level":"ERROR", "message":"fatal exception while booting Elasticsearch", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.bootstrap.Elasticsearch","elasticsearch.node.name":"es03","elasticsearch.cluster.name":"docker-cluster","error.type":"org.elasticsearch.transport.BindTransportException","error.message":"Failed to bind to xxx.xxx.xxx:[9300-9399]","error.stack_trace":"org.elasticsearch.transport.BindTransportException: Failed to bind to xxx.xxx.xxx:[9300-9399]\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.transport.TcpTransport.bindToPort(TcpTransport.java:505)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.transport.TcpTransport.bindServer(TcpTransport.java:466)\n\tat org.elasticsearch.transport.netty4@8.5.2/org.elasticsearch.transport.netty4.Netty4Transport.doStart(Netty4Transport.java:142)\n\tat org.elasticsearch.security@8.5.2/org.elasticsearch.xpack.core.security.transport.netty4.SecurityNetty4Transport.doStart(SecurityNetty4Transport.java:96)\n\tat org.elasticsearch.security@8.5.2/org.elasticsearch.xpack.security.transport.netty4.SecurityNetty4ServerTransport.doStart(SecurityNetty4ServerTransport.java:59)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:43)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.transport.TransportService.doStart(TransportService.java:311)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:43)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.node.Node.start(Node.java:1296)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.bootstrap.Elasticsearch.start(Elasticsearch.java:436)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:229)\n\tat org.elasticsearch.server@8.5.2/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:67)\nCaused by: java.net.BindException: Cannot assign requested address\n\tat java.base/sun.nio.ch.Net.bind0(Native Method)\n\tat java.base/sun.nio.ch.Net.bind(Net.java:555)\n\tat java.base/sun.nio.ch.ServerSocketChannelImpl.netBind(ServerSocketChannelImpl.java:344)\n\tat java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:301)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:506)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:491)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:973)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.AbstractChannel.bind(AbstractChannel.java:260)\n\tat io.netty.transport@4.1.77.Final/io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:356)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.77.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503)\n\tat io.netty.common@4.1.77.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)\n\tat io.netty.common@4.1.77.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1589)\n"}
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/docker-cluster.log
I think that's covered by the docs I already linked , but TLDR the simplest answer is probably to use Docker's host
networking mode instead of bridge
(noting that the Docker docs say that bridge
needs extra work when routing across hosts). Or you could set network.bind_host
if for some reason you really need bridge
networking.
Using host instead of bridge, along with setting network.host= worked. Thank you for your assistance.