Master is not joiing cluster

I have 3 master, 2 is up and but third one is not joining.
Node1: c810wem.int.thomsonreuters.com
node.name: ${HOSTNAME}-master-${ES_HTTP_PORT}
http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
+tr.security.authentication.safe.app_key: tyIBqgOBJbiBpbpCbL5/1K5XHQVsJk9fMZ6jb5PlEPg=;EYHm7MewFCWbw8KqI3B2Nw==
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9301", "c810wem.int.thomsonreuters.com:9302"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
network.host: 0
node.data: false
http.cors.allow-origin: "
"
tr.version: ${RELEASE}

Node2: c014typ.int.thomsonreuters.com

http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9301"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
network.host: 0
node.data: false
http.cors.allow-origin: "
"
tr.version: ${RELEASE}
transport.tcp.port: ${ES_TRANSPORT_PORT}
discovery.zen.ping.multicast.enabled: false

Node 3 which is not joing; c628agv.int.thomsonreuters.com

http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9303", "c014typ.int.thomsonreuters.com:9301", "c810wem.int.thomsonreuters.com:9302"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
node.data: false
http.cors.allow-origin: "
"
tr.version: ${RELEASE}
transport.tcp.port: ${ES_TRANSPORT_PORT}
discovery.zen.ping.multicast.enabled: false

and the error is:

    --> ping_response{node [{c014typ-client-9201}{zrNZVNhDQbeb2Q5WgOx-Fg}{10.204.101.87}{10.204.101.87:9301}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9401, master=true}], id[10848], master [{c810wem-master-9200}{GtZClXtTR-2iOz58_VFC7A}{10.204.101.86}{10.204.101.86:9300}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9400, master=true}], hasJoinedOnce [true], cluster_name[es-WLN-TypeAhead-Lab]}

[2017-10-01T18:57:26.295Z][DEBUG][discovery.zen.publish][c628agv-master-9200] received full cluster state version 82 with size 168267
[2017-10-01T18:57:26.297Z][DEBUG][discovery.zen.fd][c628agv-master-9200] [master] restarting fault detection against master [{c810wem-master-9200}{GtZClXtTR-2iOz58_VFC7A}{10.204.101.86}{10.204.101.86:9300}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9400, master=true}], reason [new cluster state received and we are monitoring the wrong master [null]]

o be in VALID state...
[2017-10-01T19:16:12.292Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:13.295Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:14.298Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:15.301Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:16.304Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:17.308Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:18.311Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...

What are all the settings that begin with tr.?

these are our customized security setting instead of Shild

If you have a completely custom security implementation in place that could impact the traffic within the cluster, I suspect it may be difficult for anyone here to be able to help out.

On another probably unrelated note, you should really have minimum_master_nodes set to 2 instead of 3 assuming you have 3 master eligible nodes.

ok, I can try minimum_master_nodes set to 2 and see the status.

It should allow the cluster to start up, but will probably not help the third master node join.

cluster is up now but now all access has gone and now we are getting this error:

HTTP status code: 403: authentication failed because the cluster is still recovering the security index

That I can not help you with as I know nothing about your security arrangement.

thank you Christian!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.