Cluster not formed

Alex_Davidovich · December 20, 2017, 8:26am

Usually, all works well. The hosts IPs and ports are configured properly and were not changed.
Why didn't the cluster form this time?


Elastic-1


[2017-12-19T13:03:18,058][INFO ][o.e.d.DiscoveryModule    ] [node-1] using discovery type [zen]
[2017-12-19T13:03:18,417][INFO ][o.e.n.Node               ] [node-1] initialized
[2017-12-19T13:03:18,417][INFO ][o.e.n.Node               ] [node-1] starting ...
[2017-12-19T13:03:18,539][INFO ][o.e.t.TransportService   ] [node-1] publish_address {172.16.65.114:9300}, bound_addresses {172.16.65.114:9300}
[2017-12-19T13:03:18,543][INFO ][o.e.b.BootstrapChecks    ] [node-1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-12-19T13:03:48,556][WARN ][o.e.n.Node               ] [node-1] timed out while waiting for initial discovery state - timeout: 30s
[2017-12-19T13:03:48,563][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-1] publish_address {172.16.65.114:9200}, bound_addresses {172.16.65.114:9200}
[2017-12-19T13:03:48,565][INFO ][o.e.n.Node               ] [node-1] started



Elastic-2

[2017-12-19T13:03:20,525][INFO ][o.e.t.TransportService   ] [node-2] publish_address {172.16.65.117:9300}, bound_addresses {172.16.65.117:9300}
[2017-12-19T13:03:20,530][INFO ][o.e.b.BootstrapChecks    ] [node-2] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-12-19T13:03:25,462][INFO ][o.e.c.s.ClusterService   ] [node-2] detected_master {node-3}{7KuCh13YQ4Knusm_wE0oCg}{Dq4AP-L2TyWPOdrxLaLtHg}{172.16.67.71}{172.16.67.71:9300}, added {{node-3}{7KuCh13YQ4Knusm_wE0oCg}{Dq4AP-L2TyWPOdrxLaLtHg}{172.16.67.71}{172.16.67.71:9300},}, reason: zen-disco-receive(from master [master {node-3}{7KuCh13YQ4Knusm_wE0oCg}{Dq4AP-L2TyWPOdrxLaLtHg}{172.16.67.71}{172.16.67.71:9300} committed version [1]])
[2017-12-19T13:03:25,469][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-2] publish_address {172.16.65.117:9200}, bound_addresses {172.16.65.117:9200}
[2017-12-19T13:03:25,472][INFO ][o.e.n.Node               ] [node-2] started

Elastic-3

[2017-12-19T13:03:22,310][INFO ][o.e.n.Node               ] [node-3] starting ...
[2017-12-19T13:03:22,419][INFO ][o.e.t.TransportService   ] [node-3] publish_address {172.16.67.71:9300}, bound_addresses {172.16.67.71:9300}
[2017-12-19T13:03:22,424][INFO ][o.e.b.BootstrapChecks    ] [node-3] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-12-19T13:03:25,454][INFO ][o.e.c.s.ClusterService   ] [node-3] new_master {node-3}{7KuCh13YQ4Knusm_wE0oCg}{Dq4AP-L2TyWPOdrxLaLtHg}{172.16.67.71}{172.16.67.71:9300}, added {{node-2}{aitC5uqmRtKedcr3YT4AQw}{yM7opvE7Siu7HdmpXTciNw}{172.16.65.117}{172.16.65.117:9300},}, reason: zen-disco-elected-as-master ([1] nodes joined)[{node-2}{aitC5uqmRtKedcr3YT4AQw}{yM7opvE7Siu7HdmpXTciNw}{172.16.65.117}{172.16.65.117:9300}]
[2017-12-19T13:03:25,474][INFO ][o.e.h.n.Netty4HttpServerTransport] [node-3] publish_address {172.16.67.71:9200}, bound_addresses {172.16.67.71:9200}
[2017-12-19T13:03:25,476][INFO ][o.e.n.Node               ] [node-3] started
[2017-12-19T13:03:25,729][INFO ][o.e.g.GatewayService     ] [node-3] recovered [1] indices into cluster_state
[2017-12-19T13:03:26,040][INFO ][o.e.c.r.a.AllocationService] [node-3] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[events_1513680805055][0]] ...]).
[2017-12-19T13:03:38,787][WARN ][o.e.c.a.s.ShardStateAction] [node-3] [events_1513680805055][0] received shard failed for shard id [[events_1513680805055][0]], allocation id [CwQRlRIOQzaNbB4bnc4Ydg], primary term [16], message [mark copy as stale]
[2017-12-19T13:03:53,879][WARN ][o.e.d.z.ZenDiscovery     ] [node-3] not enough master nodes (has [1], but needed [2]), current nodes: nodes: 
   {node-2}{aitC5uqmRtKedcr3YT4AQw}{yM7opvE7Siu7HdmpXTciNw}{172.16.65.117}{172.16.65.117:9300}
   {node-3}{7KuCh13YQ4Knusm_wE0oCg}{Dq4AP-L2TyWPOdrxLaLtHg}{172.16.67.71}{172.16.67.71:9300}, local, master

My transport client tried to connect the cluster and perform write requests, got:


org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];

Elasticsearch 5.4

And why 1 Elasticsearch started without minimum master nodes that is configured as 2?

.yml:
discovery.zen.minimum_master_nodes: 2 discovery.zen.commit_timeout: 2s discovery.zen.publish_timeout: 2s discovery.zen.fd.ping_timeout: 1s transport.tcp.connect_timeout: 1s

dadoonet · December 20, 2017, 8:29am

Apparently Node 1 can not be seen by the others.

You started Node2. It waited until Node3 joined. They formed the cluster.

But Node1 is still not there.
I'd suggest to restart Node1?

Alex_Davidovich · December 20, 2017, 8:36am

Why is my client not working? I have 2 nodes as you said but it connects only 1 I guess...
I want it to fail...
I am using


client.addTransportAddress(new InetSocketTransportAddress(address, EventsConstants.ES_PORT));

3 times for the three addresses.
If it cannot connect all 3 I want it to fail.
In addition, why node-1 was not visible? you suggest network issue?
And why node-1 didn't shutdown when he didn't have enough master nodes?

dadoonet · December 20, 2017, 8:48am

you suggest network issue?

Yes. Might be.

And why node-1 didn't shutdown when he didn't have enough master nodes?

He is waiting for enough master nodes to come online. Once he can find them, it will join the cluster.

Alex_Davidovich · December 20, 2017, 8:50am

So why a client configured to connect to 3 nodes isn't throwing any exception in this kind of scenario?
what API can I use in order to know which nodes are in which cluster?
Any timeout for the node to understand that he can't see his "friends" in the cluster? And understand that he his alone and should go down?

dadoonet · December 20, 2017, 9:37am

So why a client configured to connect to 3 nodes isn't throwing any exception in this kind of scenario?

That's a valid point. The node is available but can't really deal with requests as it is not part of the cluster. Which is not a correct behavior. I mean that when you have a node like this you should fix the problem ASAP before using any client.

what API can I use in order to know which nodes are in which cluster?

Probably something like Nodes Info?

Any timeout for the node to understand that he can't see his "friends" in the cluster?

As far as I remember, it's 30s by default before the "WARN" is printed.

And understand that he his alone and should go down?

That will never happen. If you want to shut it down, that must be something you control.
Because, if it's a network issue, the network can come back again and the node can join the cluster again.

Alex_Davidovich · December 20, 2017, 9:40am

Thank you, I think that I will try using nodes info API and handle it from there.

system · January 17, 2018, 9:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data nodes are not able to join master node and failed to make a cluster Elasticsearch	14	2552	October 5, 2018
Nodes not joining cluster Elasticsearch	5	3380	July 6, 2017
Cluster connection issues when the machines hosting the nodes are restarted for service maintanance Elasticsearch	7	1043	July 6, 2017
Two of the twelve nodes not joining the cluster Elasticsearch	6	1803	July 5, 2017
Can't join cluster Elasticsearch	5	455	July 6, 2017

Cluster not formed

Related topics