Master is not joiing cluster

ghanshyam · October 1, 2017, 7:48pm

I have 3 master, 2 is up and but third one is not joining.
Node1: c810wem.int.thomsonreuters.com
node.name: ${HOSTNAME}-master-${ES_HTTP_PORT}
http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
+tr.security.authentication.safe.app_key: tyIBqgOBJbiBpbpCbL5/1K5XHQVsJk9fMZ6jb5PlEPg=;EYHm7MewFCWbw8KqI3B2Nw==
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9301", "c810wem.int.thomsonreuters.com:9302"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
network.host: 0
node.data: false
http.cors.allow-origin: ""
tr.version: ${RELEASE}

Node2: c014typ.int.thomsonreuters.com

http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9301"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
network.host: 0
node.data: false
http.cors.allow-origin: ""
tr.version: ${RELEASE}
transport.tcp.port: ${ES_TRANSPORT_PORT}
discovery.zen.ping.multicast.enabled: false

Node 3 which is not joing; c628agv.int.thomsonreuters.com

http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9303", "c014typ.int.thomsonreuters.com:9301", "c810wem.int.thomsonreuters.com:9302"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
node.data: false
http.cors.allow-origin: ""
tr.version: ${RELEASE}
transport.tcp.port: ${ES_TRANSPORT_PORT}
discovery.zen.ping.multicast.enabled: false

and the error is:

    --> ping_response{node [{c014typ-client-9201}{zrNZVNhDQbeb2Q5WgOx-Fg}{10.204.101.87}{10.204.101.87:9301}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9401, master=true}], id[10848], master [{c810wem-master-9200}{GtZClXtTR-2iOz58_VFC7A}{10.204.101.86}{10.204.101.86:9300}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9400, master=true}], hasJoinedOnce [true], cluster_name[es-WLN-TypeAhead-Lab]}

[2017-10-01T18:57:26.295Z][DEBUG][discovery.zen.publish][c628agv-master-9200] received full cluster state version 82 with size 168267
[2017-10-01T18:57:26.297Z][DEBUG][discovery.zen.fd][c628agv-master-9200] [master] restarting fault detection against master [{c810wem-master-9200}{GtZClXtTR-2iOz58_VFC7A}{10.204.101.86}{10.204.101.86:9300}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9400, master=true}], reason [new cluster state received and we are monitoring the wrong master [null]]

o be in VALID state...
[2017-10-01T19:16:12.292Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:13.295Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:14.298Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:15.301Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:16.304Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:17.308Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:18.311Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...

warkolm · October 1, 2017, 9:15pm

What are all the settings that begin with tr.?

ghanshyam · October 2, 2017, 2:22pm

these are our customized security setting instead of Shild

Christian_Dahlqvist · October 2, 2017, 2:31pm

If you have a completely custom security implementation in place that could impact the traffic within the cluster, I suspect it may be difficult for anyone here to be able to help out.

On another probably unrelated note, you should really have minimum_master_nodes set to 2 instead of 3 assuming you have 3 master eligible nodes.

ghanshyam · October 2, 2017, 2:43pm

ok, I can try minimum_master_nodes set to 2 and see the status.

Christian_Dahlqvist · October 2, 2017, 2:52pm

It should allow the cluster to start up, but will probably not help the third master node join.

ghanshyam · October 2, 2017, 4:25pm

cluster is up now but now all access has gone and now we are getting this error:

HTTP status code: 403: authentication failed because the cluster is still recovering the security index

Christian_Dahlqvist · October 2, 2017, 4:26pm

That I can not help you with as I know nothing about your security arrangement.

ghanshyam · October 2, 2017, 4:53pm

thank you Christian!!

system · October 30, 2017, 4:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 3 nodes cluster not joining with each other Elasticsearch	17	2292	July 30, 2021
Node not joining Elasticsearch cluster Elasticsearch	9	606	June 9, 2021
Non master node not able to join a master node Elasticsearch	3	460	September 10, 2020
Node is not joining the cluster (ES-5.6.3) Elasticsearch	7	1924	December 14, 2017
Elasticsearch master(8.0.0) nodes can't form into a cluster Elasticsearch	3	511	August 28, 2020

Master is not joiing cluster

Related topics