Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster


(David Turner) #41

Ok, that looks like it succeeded:

[2019-04-11T13:41:51,674][INFO ][o.e.c.s.ClusterApplierService] [d-gp2-kyles-1] master node changed {previous [], current [{d-gp2-kyles-3}{DFvnIfwDRS-g1Z3nGRuogg}{0HE_55jUTOSQDJImo0GcZA}{10.124.193.72}{10.124.193.72:9300}{ml.machine_memory=4143783936, ml.max_open_jobs=20, xpack.installed=true}]}, added {{d-gp2-kyles-3}{DFvnIfwDRS-g1Z3nGRuogg}{0HE_55jUTOSQDJImo0GcZA}{10.124.193.72}{10.124.193.72:9300}{ml.machine_memory=4143783936, ml.max_open_jobs=20, xpack.installed=true},}, term: 1, version: 1, reason: ApplyCommitRequest{term=1, version=1, sourceNode={d-gp2-kyles-3}{DFvnIfwDRS-g1Z3nGRuogg}{0HE_55jUTOSQDJImo0GcZA}{10.124.193.72}{10.124.193.72:9300}{ml.machine_memory=4143783936, ml.max_open_jobs=20, xpack.installed=true}}

So now the question is, why doesn't this work for you with ${HOSTNAME}?


(Kyle Stephenson) #42

actually shows it's working now


(Kyle Stephenson) #43

yeah, that's weird ... and why it didn't work when i was doing my upgrade ... i believe i tried it with both ${HOSTNAME} and the actual name.


(David Turner) #44

Can you try echo -n $HOSTNAME | xxd to see if there's any weird characters in there that aren't being logged faithfully?


(Kyle Stephenson) #45

00000000: 642d 6770 322d 6b79 6c65 732d 31 d-gp2-kyles-1


(Kyle Stephenson) #46

maybe because i was putting the extension and hostname does not have the extension?


(Kyle Stephenson) #47

do you have to be consistent? i had node.name=hostname and i had network.host=ip_address


(Kyle Stephenson) #48

this was not an issue in 6.7.1 for me


(David Turner) #49

As I said above:

No extensions or anything, they need to be exactly the same.


(Kyle Stephenson) #50

ok, HOSTNAME gives me the shortname ... so i believe the problem is that i was specifying hostname with the extension and it was comparing the two and failing. i just went in and put the {HOSTNAME} back and removed the extension everywhere else and it works still. i will try the upgrade again and let you know my findings.


(Kyle Stephenson) #51

yes, that was the issue ... they have to be identical with the node_name.blah.com or just node_name but seems that elasticsearch creates a hashcode on the node_name so they must be identical. thanks for your help.


(Safdar Ali) #52

I tried with your recmendation, then it poped up with this error,

org.elasticsearch.transport.RemoteTransportException: [elasticsearch-data-2][172.20.13.119:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid KkR7myqKQU6x02Qzd5KMIw than local cluster uuid U5B7kS8rQ12gS2sJeFfbIA, rejecting

at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:147) ~[elasticsearch-7.0.0.jar:7.0.0]

at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:251) ~[?:?]

at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.0.0.jar:7.0.0]

at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:309) ~[?:?]

at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.0.0.jar:7.0.0]

at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1077) ~[elasticsearch-7.0.0.jar:7.0.0]

at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-7.0.0.jar:7.0.0]

at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.0.0.jar:7.0.0]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]


(David Turner) #53

@safderali5 would you open another thread about your issue(s) - this thread is marked as resolved, and the problems you're facing are different from the ones we dug into here.


(Safdar Ali) #54

OK, I will it in a different thread.


(David Turner) #55

We have added a note to the docs to clarify this point lest it catch anyone else out:

The node names used in this list must exactly match the node.name properties of the nodes. By default the node name is set to the machine’s hostname which may or may not be fully-qualified depending on your system configuration. If each node name is a fully-qualified domain name such as master-a.example.com then you must use fully-qualified domain names in the cluster.initial_master_nodes list too; conversely if your node names are bare hostnames (without the .example.com suffix) then you must use bare hostnames in the cluster.initial_master_nodes list. If you use a mix of fully-qualifed and bare hostnames, or there is some other mismatch between node.name and cluster.initial_master_nodes , then the cluster will not form successfully and you will see log messages like the following.

[master-a.example.com] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [master-a, master-b] to bootstrap a cluster: have discovered [{master-b.example.com}{...

This message shows the node names master-a.example.com and master-b.example.com as well as the cluster.initial_master_nodes entries master-a and master-b , and it is apparent that they do not match exactly.


#56

I am also running on Kubernetes and had an issue bootstraping the cluster in 7. My solution was to set an environment variable:
- name: cluster.initial_master_nodes valueFrom: fieldRef: fieldPath: metadata.name

I'm not sure this is the best solution, ideally i'd just configure the cluster to require 2/3 masters at any time like in the old ES. Using this, when the cluster first starts, the first node will already consider itself ready, and there is potential the clusters are partitioned if they don't find the other masters immediately. Hopefully it will only matter once, but I'm not really sure things will work out when updating the cluster yet.


(David Turner) #57

@jswid your formatting was mangled, but assuming you mean the following:

This is not recommended. From the docs:

You must set cluster.initial_master_nodes to the same list of nodes on each node on which it is set in order to be sure that only a single cluster forms during bootstrapping and therefore to avoid the risk of data loss.

With your suggestion you are configuring cluster.initial_master_nodes differently on each node, and there is a good chance that you will form more than one cluster.


#58

Yes, thanks.. i even tried to delete my comment, but I guess it didn't take. I ended up doing it a different way. I changed the Deployment to a StatefulSet, which I think is better for two reasons: one is that the node names are constant, which solves the big issue in this thread, but the other is that the masters seem to care more about a cluster uuid now, so I am now mounting a persistent volume so the nodes' data folders are no longer lost when the masters are updated.

I have a public template on github that more or less shows how I am planning on moving to ES7 on Kubernetes here: https://github.com/jswidler/elasticsearch-kubed/blob/master/templates/2_elasticsearch/es-master.yml