Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node

Hi, we just moved to Elasticseach 7.0.0.

We're running into issues shown as below:

[2019-04-18T00:02:49,213][WARN ][o.e.c.c.ClusterFormationFailureHelper] [dev-efk-backend03] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{dev-efk-backend02}{iO6O8mpUQaGtCk8NWO5X9w}{n4xlapSxSAm6MxUMF6eDBA}{192.168.0.104}{192.168.0.104:9300}{ml.machine_memory=16171651072, ml.max_open_jobs=20, xpack.installed=true}, {dev-efk-backend05}{gKWS3gxIRs-P5FFlvMTm4A}{EgRlpCNPRZG5jtjhNuRHPQ}{192.168.0.239}{192.168.0.239:9300}{ml.machine_memory=16171651072, ml.max_open_jobs=20, xpack.installed=true}, {dev-efk-backend04}{WPeuO5I1SDWZkVTdpVGqlQ}{yJ6DOVOeSnWmzIDk5SnGmw}{192.168.0.21}{192.168.0.21:9300}{ml.machine_memory=16171651072, ml.max_open_jobs=20, xpack.installed=true}, {dev-efk-backend01}{9-ePQkHuS_6IAyGpYSCyDg}{T2I05C5UQKOn0UjMi94EgQ}{192.168.0.202}{192.168.0.202:9300}{ml.machine_memory=16171651072, ml.max_open_jobs=20, xpack.installed=true}]; discovery will continue using [192.168.0.202:9300, 192.168.0.104:9300, 192.168.0.239:9300, 192.168.0.21:9300] from hosts providers and [{dev-efk-backend03}{xYDB_57RTQaUmCzxQVMm-A}{Qg0WZl8VT-On8d64ook-Mw}{192.168.0.142}{192.168.0.142:9300}{ml.machine_memory=16171651072, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 0, last-accepted version 0 in term 0

I followed the other topic (Master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster) and did some tweaking, so my elasticsearch.yml looks like this:

cluster.initial_master_nodes: dev-efk-backend01,dev-efk-backend02,dev-efk-backend05,dev-efk-backend03,dev-efk-backend04

cluster.name: elk-dev

discovery.seed_hosts: ip-192-168-0-202.ec2.internal:9300,ip-192-168-0-104.ec2.internal:9300,ip-192-168-0-239.ec2.internal:9300,ip-192-168-0-142.ec2.internal:9300,ip-192-168-0-21.ec2.internal:9300

http.port: 9200

network.host: 0.0.0.0

node.data: true

node.master: true

node.name: dev-efk-backend01

transport.tcp.port: 9300

I have two questions:

  1. Because we have used ansible elastic playbook, so the cluster.initial_master_nodes and discovery.seed_hosts look different. Do does the proper yml syntax matter here?
    Do we have to set these two like this:

cluster.initial_master_nodes:
-dev-efk-backend01
-dev-efk-backend02
-dev-efk-backend05
-dev-efk-backend03
-dev-efk-backend04

  1. I read it in the other post, this cluster.initial_master_nodes only needs to be set on the master node, then start the master first and restart the rest of the nodes. Is this setting and order really necessary?

Thank you.

This disagrees with the rest of your post, in which I think you are saying that you have set this setting. Are you sure you are still getting this message? If not, can you share the exact message that you are now getting?

Yes, but this bit of your post is not correctly formatted so it's hard to see whether you're setting it right or not. YAML is whitespace-sensitive so it's important for us to see exactly what you're seeing. Please use the </> button when formatting your post, and check the preview window on the right to make sure it looks correct.

The setting is necessary, yes, but only on the master node(s) and only the first time you start the cluster. It shouldn't matter which order you start the nodes.

Hi David,
Thank you for your quick reply.

For your questions:

  1. Yes, I have set this setting but in this format (formatting is under ansible elasticserach playbook's control: https://github.com/elastic/ansible-elasticsearch/blob/master/templates/elasticsearch.yml.j2):
cluster.initial_master_nodes: dev-efk-backend01,dev-efk-backend02,dev-efk-backend05,dev-efk-backend03,dev-efk-backend04
cluster.name: elk-dev
discovery.seed_hosts: ip-192-168-0-202.ec2.internal:9300,ip-192-168-0-104.ec2.internal:9300,ip-192-168-0-239.ec2.internal:9300,ip-192-168-0-142.ec2.internal:9300,ip-192-168-0-21.ec2.internal:9300
http.port: 9200
network.host: 0.0.0.0
node.data: true
node.master: true
node.name: dev-efk-backend01
transport.tcp.port: 9300

And since I'm using ansible playbook, the nodes are started one by one. (No particular order).

It's because I'm still seeing [cluster.initial_master_nodes] is empty on this node after this setting, I suspected that it had something to do with the yaml syntax.

  1. I will manually go into the master node (dev-efk-backend01) and adjust the syntax to something like this:
cluster.initial_master_nodes:
  - dev-efk-backend01
discovery.seed_hosts:
  - ip-192-168-0-202.ec2.internal:9300
  - ip-192-168-0-104.ec2.internal:9300
  - ip-192-168-0-239.ec2.internal:9300
  - ip-192-168-0-142.ec2.internal:9300 
  - ip-192-168-0-21.ec2.internal:9300
http.port: 9200
network.host: 0.0.0.0
node.data: true
node.master: true
node.name: dev-efk-backend01
transport.tcp.port: 9300
  1. I'm not sure how do you mean by "the first time you start the cluster". Now all of our 5 servers are up and running the 5 elasticserach nodes/services. Should I stop them all and restart them?

Thank you again.

Ok, is that the elasticsearch.yml file that Ansible is producing with your changes? If so, it looks reasonable, so I think I would try and investigate whether this is the config file that Elasticsearch is actually using.

(There's no mention of cluster.initial_master_nodes in the template file that you linked (https://github.com/elastic/ansible-elasticsearch/blob/master/templates/elasticsearch.yml.j2) and there's an open issue for 7.0 support in the Ansible repo.)

Sorry, I can be a little more precise: it's needed to elect the first master. If the nodes are up and running and have elected a master then all is good, you no longer need this setting. However if the nodes are running but they're all saying this node has not previously joined a bootstrapped (v7+) cluster then the cluster has not yet really started, so this setting is needed.

Hi David,

  1. Yes, we are passing a variable called es_config to that plabybook:
    - name: Build Elasticsearch Cluster Config
      set_fact:
        es_config:
          'discovery.seed_hosts': "{{ es_config['discovery.seed_hosts'][:-1] }}"
          'cluster.initial_master_nodes': "{{ es_config['cluster.initial_master_nodes'][:-1] }}"
          node.name: '{{ ec2_tag_Name }}'
          network.host: '0.0.0.0'
          cluster.name: '{{ efk_es_cluster_name }}'
          http.port: '{{ efk_es_http_port }}'
          transport.tcp.port: '{{ efk_es_transport_tcp_port }}'
          node.data: true
          node.master: true

And via this line in https://github.com/elastic/ansible-elasticsearch/blob/master/templates/elasticsearch.yml.j2#L3 , it will be translated into what I pasted in the previous post.

  1. I have made changes on the master node and restarted all of them.
    I'm seeing different logs:

from master node: https://pastebin.com/ZyJ5vpRZ
from the other four:
node2: https://pastebin.com/9NDyE2FN
node3: https://pastebin.com/cmPc17YL
ndoe4:https://pastebin.com/FbvCA9et
node5: https://pastebin.com/iKBVwwLD

That looks like everything is working: there's a master node and the other nodes have joined its cluster.

Hi David,
Good morning.
It still says "Kibana server is not ready yet".
The other 4 nodes are reporting exception. Should we just ignore them? I suspect they are related. It's the same exception being reported in the other topic/post:

{192.168.0.202}{192.168.0.202:9300}{ml.machine_memory=16171651072, ml.max_open_jobs=20, xpack.installed=true} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [dev-efk-backend01][192.168.0.202:9300] connect_exception
	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1299) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:99) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-7.0.0.jar:7.0.0]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883) ~[?:?]
	at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2322) ~[?:?]
	at org.elasticsearch.common.concurrent.CompletableContext.addListener(CompletableContext.java:45) ~[elasticsearch-core-7.0.0.jar:7.0.0]
	at org.elasticsearch.transport.netty4.Netty4TcpChannel.addConnectListener(Netty4TcpChannel.java:100) ~[?:?]
	at org.elasticsearch.transport.TcpTransport.initiateConnection(TcpTransport.java:325) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.transport.TcpTransport.openConnection(TcpTransport.java:292) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.transport.ConnectionManager.internalOpenConnection(ConnectionManager.java:206) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.transport.ConnectionManager.connectToNode(ConnectionManager.java:104) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:343) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:330) ~[elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.cluster.NodeConnectionsService.validateAndConnectIfNeeded(NodeConnectionsService.java:153) [elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.cluster.NodeConnectionsService$1.doRun(NodeConnectionsService.java:106) [elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-7.0.0.jar:7.0.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.0.0.jar:7.0.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 192.168.0.202/192.168.0.202:9300
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
	... 1 more
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
	... 1 more

This exception says that this node can't to 192.168.0.202:9300. I can't really say if that's a problem. Do you think it should be able to connect to that address? Is the cluster otherwise healthy?

Yeah, 192.168.0.202 is the master node (node1).
I noticed I have one setting in the elasticseach.yml:
transport.tcp.port: 9300
But in the 7.0 documentation, there is no such field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-transport.html

Ok, in which case it seems bad that this node can't connect to it. Is the master node ok? Still running? Anything of interest in its logs from around the same time?

I think this is the old name for transport.port, see for instance the 6.6 version of the same page.

Hi David,
You are right.
I believe the cluster has formed.
The connection refused on 9300 is just when I just down the master node, other nodes can not connect to it.
I suspect the 'kibana server is not ready' alert has something to do with kibana setting.
The error is

-- Logs begin at Tue 2019-04-23 16:03:27 UTC, end at Tue 2019-04-23 20:53:52 UTC. --
Apr 23 20:53:36 ip-192-168-0-21.ec2.internal kibana[16822]:{"type":"log","@timestamp":"2019-04-23T20:53:36Z","tags":["fatal","root"],"pid":16822,
Apr 23 20:53:36 ip-192-168-0-21.ec2.internal kibana[16822]: FATAL  ValidationError: child "elasticsearch" fails because ["host" is not allowed]

Do you happened to do what changes Kibana 7 has made to cause this or I should just creat a new topic?

That makes sense, yes. I would expect messages like that if you shut down a node.

I do not, sorry, I don't know Kibana very well. I can't see anything obvious in the breaking changes docs so try and ask in the Kibana forum.

Thank you very much, David!
I didn't get notification that's why I hasn't replied.
I'm all set now.
Should I close this topic or only you can do that?

oh, before you closed it, you mentioned

So that means, my setting for this variable could be like this?

cluster.initial_master_nodes
  - dev-efk-backend01
  - dev-efk-backend02
  - dev-efk-backend03
  - dev-efk-backend04
  - dev-efk-backend05

I read it on the doc, we can have more than one, but just not sure whether we can have all of them?

Yes, that looks right: you'd normally put all of your master nodes in this list.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.