Hi there.
I'm trying to deploy 3 masters, 2 data, and 2 ingest nodes to GCP k8s.
There is a problem that masters struggle to discover each other.
Here are the logs I get: "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es-master-0, es-master-1, es-master-2] to bootstrap a cluster: have discovered [{es-master-0}{SPyxwBwVQJ2SVlsI2pLZFg}{62jGZWMZR_KxyGx-OQSC3A}{10.84.0.141}{10.84.0.141:9300}{m}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305] from hosts providers and [{es-master-0}{SPyxwBwVQJ2SVlsI2pLZFg}{62jGZWMZR_KxyGx-OQSC3A}{10.84.0.141}{10.84.0.141:9300}{m}] from last-known cluster state; node term 0, last-accepted version 0 in term 0"
The weird thing is that es-master-0 discovers only es-master-0, and the same about other master pods.
Here is my Elasticsearch.yaml:
You need to tell Elasticsearch how to discover the other master nodes, typically by setting discovery.seed_hosts. By default discovery will only try a few ports on localhost:
discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305] from hosts providers
Added discovery.seed_hosts the same way as cluster.initial_master_nodes:
The previous error disappeared, but there is a new problem: {"type": "server", "timestamp": "2021-11-12T16:33:35,960Z", "level": "WARN", "component": "o.e.d.SeedHostsResolver", "cluster.name": "es-backup-cluster", "node.name": "es-master-0", "message": "failed to resolve host [es-master-1]" "java.net.UnknownHostException: es-master-1: Name or service not known"
Yes as @ahmed_charafouddine says: Elasticsearch is doing a DNS lookup for es-master-1 but your DNS doesn't know this name. Either add this hostname to DNS or else just use its address.
Ok, got it. Now I'm trying to use pod's address. Can you help with this, please?
I try to check the connection to one pod from another using this pattern: pod-name.service-name.namespace.svc.cluster.local
in my case, it is: es-master-2.es-master.es-cluster.svc.cluster.local and I cannot access it. Am I doing it wrong?
Between, I can connect to the service like this: nc es-master.es-cluster.svc.cluster.local
I'm not sure, this sounds like a networking-and-containers question rather than anything specific to Elasticsearch, so this might not be the best place to ask.
I've found a way to connect from one master k8s pod to another
here is an address that I use: es-master-2.es-master.es-cluster.svc.cluster.local
I've added such addresses to discovery.seed_hosts
and now I have the same error that was before: master not discovered yet, this node has not previously joined
here is my current Elasticsearch.yaml:
"master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{es-master-0}{vUoyCU2sQ5OFnXv9Bj-LHA}{ri7xpeDlSuOclHnkuiXvbg}{10.84.0.150}{10.84.0.150:9300}{m}, {es-master-1}{CpLNUEKlQP6ZSHtL-6LjEQ}{AG17cWBqS7aZlQSXUuAtqw}{10.84.2.119}{10.84.2.119:9300}{m}, {es-master-2}{b4DtEy9lTnGVCRm-RareKg}{344M5bIrSwGbNsL0zYgaNA}{10.84.3.118}{10.84.3.118:9300}{m}]; discovery will continue using [10.84.2.119:9300, 10.84.3.118:9300] from hosts providers and [{es-master-0}{vUoyCU2sQ5OFnXv9Bj-LHA}{ri7xpeDlSuOclHnkuiXvbg}{10.84.0.150}{10.84.0.150:9300}{m}] from last-known cluster state; node term 0, last-accepted version 0 in term 0"
Are these addresses correct? Are there any other messages in the logs? What version are you using? I think in recent versions it'll be logging more details too.
added initial_master_nodes
and still similar issue as the first one: master-0 discovers only master-0, and there same for other pods "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [es-master-0.es-master.es-cluster.svc.cluster.local, es-master-1.es-master.es-cluster.svc.cluster.local, es-master-2.es-master.es-cluster.svc.cluster.local] to bootstrap a cluster: have discovered [{es-master-0}{wJc_UcmwTVaQhL0TBOS93A}{yf9TrRd7QUWTWq77Lncdow}{10.84.0.151}{10.84.0.151:9300}{m}, {es-master-1}{tfghLpmVQLmNN2H_O_ituw}{VRlYo2PvR2-GXkS0YvmzmQ}{10.84.2.120}{10.84.2.120:9300}{m}, {es-master-2}{-jP3pAubQ8i1WMIYCGHjoQ}{oJ3Z1nfHR2CZ3CSYZwE58w}{10.84.3.119}{10.84.3.119:9300}{m}]; discovery will continue using [10.84.2.120:9300, 10.84.3.119:9300] from hosts providers and [{es-master-0}{wJc_UcmwTVaQhL0TBOS93A}{yf9TrRd7QUWTWq77Lncdow}{10.84.0.151}{10.84.0.151:9300}{m}] from last-known cluster state; node term 0, last-accepted version 0 in term 0"
cluster.initial_master_nodes should be set to the node names (es-master-0, etc), not their fully-qualified DNS names. The setup in your first post was right. See these docs for more information.
In my set up I use StatefulSet for master and data nodes
All the services have both cluster.initial_master_nodes and discovery.seed_hosts configured as shown above es-master-0.es-master.es-cluster.svc.cluster.local is - podName.serviceName.namespace.svc.cluster.local
if there is no namespace specified - put "default" there
That's great, thanks for confirming it all works now.
I think this shouldn't be your final config, you should remove cluster.initial_master_nodes after the first time the cluster forms. See these docs for more info.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.