I'm trying to set up a simple cluster using 2 GCE VMs (test-elk1 and test-elk2) from inside docker containers.
I've setup firewall rules so they are completely visible to each other, so performing:
curl http://<instance_internal_IP>:9200
and
curl http://test-elk1:9200
from test-elk2 work just fine.
The problem is that they're not forming a cluster, and the logs are not clear on the reason why:
root@test-elk2:/home/yago/elk/es-data-node# docker-compose logs -f | grep '"level": "DEBUG"'
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:25,277Z", "level": "DEBUG", "component": "o.e.d.z.ElectMasterService", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "using minimum_master_nodes [-1]" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:32,431Z", "level": "DEBUG", "component": "o.e.a.ActionModule", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Using REST wrapper from plugin org.elasticsearch.xpack.security.Security" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:33,151Z", "level": "DEBUG", "component": "o.e.d.SettingsBasedSeedHostsProvider", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "using initial hosts [test-elk1]" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:35,737Z", "level": "DEBUG", "component": "o.e.t.n.Netty4Transport", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "using profile[default], worker_count[2], port[9300-9400], bind_host[[0.0.0.0]], publish_host[[]], receive_predictor[64kb->64kb]" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:35,758Z", "level": "DEBUG", "component": "o.e.t.TcpTransport", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "binding server bootstrap to: [0.0.0.0]" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:35,935Z", "level": "DEBUG", "component": "o.e.t.TcpTransport", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Bound profile [default] to address {0.0.0.0:9300}" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:14:36,000Z", "level": "DEBUG", "component": "o.e.d.SeedHostsResolver", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "using max_concurrent_resolvers [10], resolver timeout [5s]" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:15:06,618Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Peer{transportAddress=<test-elk1 internal IP>:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:15:37,260Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Peer{transportAddress=<test-elk1 internal IP>:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:16:07,360Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Peer{transportAddress=<test-elk1 internal IP>:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:16:37,481Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Peer{transportAddress=<test-elk1 internal IP>:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:17:08,538Z", "level": "DEBUG", "component": "o.e.d.PeerFinder", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "Peer{transportAddress=<test-elk1 internal IP>:9300, discoveryNode=null, peersRequestInFlight=false} connection failed",
[...]
Some other logs in TRACE mode show that it connects then "unregisters"?
root@test-elk2:/home/yago/elk/es-data-node# docker-compose logs | grep 0x7ed0e599
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:39:38,368Z", "level": "TRACE", "component": "o.e.t.n.ESLoggingHandler", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "[id: 0x7ed0e599] REGISTERED" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:39:38,371Z", "level": "TRACE", "component": "o.e.t.n.ESLoggingHandler", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "[id: 0x7ed0e599] CONNECT: 192.168.96.2/192.168.96.2:9300" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:40:08,378Z", "level": "TRACE", "component": "o.e.t.n.ESLoggingHandler", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "[id: 0x7ed0e599] CLOSE" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:40:08,379Z", "level": "TRACE", "component": "o.e.t.n.ESLoggingHandler", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "[id: 0x7ed0e599] CLOSE" }
es-data_1 | {"type": "server", "timestamp": "2020-07-10T22:40:08,380Z", "level": "TRACE", "component": "o.e.t.n.ESLoggingHandler", "cluster.name": "test-elk", "node.name": "test-elk2", "message": "[id: 0x7ed0e599] UNREGISTERED" }
I can also see from the trace that the IP it connects to is the IP that test-elk1 is publishing to, but as it is an internal IP, it's unreachable from test-elk2:
root@test-elk1:/home/yago/elk# docker-compose logs --tail=15 elasticsearch
Attaching to elk_elasticsearch_1
elasticsearch_1 | {"type": "server", "timestamp": "2020-07-10T22:02:55,528Z", "level": "INFO", "component": "o.e.d.DiscoveryModule", "cluster.name": "test-elk", "node.name": "test-elk1", "message": "using discovery type [zen] and seed hosts providers [settings]" }
elasticsearch_1 | {"type": "server", "timestamp": "2020-07-10T22:02:58,569Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "test-elk", "node.name": "test-elk1", "message": "initialized" }
elasticsearch_1 | {"type": "server", "timestamp": "2020-07-10T22:02:58,572Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "test-elk", "node.name": "test-elk1", "message": "starting ..." }
elasticsearch_1 | {"type": "server", "timestamp": "2020-07-10T22:02:59,006Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "test-elk", "node.name": "test-elk1", "message": "publish_address {192.168.96.2:9300}, bound_addresses {0.0.0.0:9300}" }
elasticsearch_1 | {"type": "server", "timestamp": "2020-07-10T22:02:59,037Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "test-elk", "node.name": "test-elk1", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
elasticsearch_1 | {"type": "server", "timestamp": "2020-07-10T22:02:59,141Z", "level": "INFO", "component": "o.e.c.c.Coordinator", "cluster.name": "test-elk", "node.name": "test-elk1", "message": "cluster UUID [drqPHHGvQ66kEBW2DnR7dA]" }
I can also see from the trace logs that test-elk2 manages to connect to test-elk1's real internal IP address and reads some properties from the node, one of them being the publish_address, which then it tries to connect to, unsuccessfully. Or at least that's my guess.
My configurations are:
test-elk1 elasticsearch.yml:
cluster.name: "test-elk"
node.name: "test-elk1"
network.host: "0.0.0.0"
discovery.seed_hosts: ["test-elk1","test-elk2"]
xpack.license.self_generated.type: trial
xpack.security.enabled: true
xpack.monitoring.collection.enabled: true
test-elk2 elasticsearch.yml:
cluster.name: "test-elk"
node:
name: "test-elk2"
master: true
voting_only: false
data: true
ingest: false
network.host: "0.0.0.0"
discovery.seed_hosts: ["test-elk1"]
logger.org.elasticsearch.discovery: TRACE
logger.org.elasticsearch.transport: TRACE
xpack:
license.self_generated.type: trial
monitoring.collection.enabled: true
security.enabled: true
If you need more info, let me know. Also if you have a similar setup, could you pass some config tips? I've been going around just with trial and error strategy, but nothing seems to allow the cluster to be formed, my guess is the problem is with the elasticsearch.yml config files but I can't pinpoint where.