Master is not joiing cluster


(Ghanshyam Singh) #1

I have 3 master, 2 is up and but third one is not joining.
Node1: c810wem.int.thomsonreuters.com
node.name: ${HOSTNAME}-master-${ES_HTTP_PORT}
http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
+tr.security.authentication.safe.app_key: tyIBqgOBJbiBpbpCbL5/1K5XHQVsJk9fMZ6jb5PlEPg=;EYHm7MewFCWbw8KqI3B2Nw==
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9301", "c810wem.int.thomsonreuters.com:9302"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
network.host: 0
node.data: false
http.cors.allow-origin: "
"
tr.version: ${RELEASE}

Node2: c014typ.int.thomsonreuters.com

http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9300", "c014typ.int.thomsonreuters.com:9301"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
network.host: 0
node.data: false
http.cors.allow-origin: "
"
tr.version: ${RELEASE}
transport.tcp.port: ${ES_TRANSPORT_PORT}
discovery.zen.ping.multicast.enabled: false

Node 3 which is not joing; c628agv.int.thomsonreuters.com

http.cors.enabled: true
tr.security.transport.ipfilter: "+n:c628agv.int.thomsonreuters.com, +n:c219usb.int.thomsonreuters.com, +n:c128rrs.int.thomsonreuters.com, +n:c810wem.int.thomsonreuters.com, +n:c014typ.int.thomsonreuters.com, +n:localhost, -n:"
path.plugins: ${INSTANCE_BASE}/plugins
cluster.routing.allocation.same_shard.host: true
bootstrap.mlockall: true
path.data: /data/es
cluster.name: es-WLN-TypeAhead-Lab
tr.security.authorization.cache.refresh_minutes: 5
http.port: ${ES_HTTP_PORT}
node.master: true
tr.security.enabled: true
discovery.zen.minimum_master_nodes: 3
tr.security.authentication.basic.enabled: true
tr.security.authentication.safe.enabled: true
http.cors.allow-credentials: true
path.logs: /log
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, Authorization"
discovery.zen.ping.unicast.hosts: ["c628agv.int.thomsonreuters.com:9303", "c014typ.int.thomsonreuters.com:9301", "c810wem.int.thomsonreuters.com:9302"]
security.manager.enabled: false
node.appserver_instance: ${APPSERVER_INSTANCE}
node.data: false
http.cors.allow-origin: "
"
tr.version: ${RELEASE}
transport.tcp.port: ${ES_TRANSPORT_PORT}
discovery.zen.ping.multicast.enabled: false

and the error is:

    --> ping_response{node [{c014typ-client-9201}{zrNZVNhDQbeb2Q5WgOx-Fg}{10.204.101.87}{10.204.101.87:9301}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9401, master=true}], id[10848], master [{c810wem-master-9200}{GtZClXtTR-2iOz58_VFC7A}{10.204.101.86}{10.204.101.86:9300}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9400, master=true}], hasJoinedOnce [true], cluster_name[es-WLN-TypeAhead-Lab]}

[2017-10-01T18:57:26.295Z][DEBUG][discovery.zen.publish][c628agv-master-9200] received full cluster state version 82 with size 168267
[2017-10-01T18:57:26.297Z][DEBUG][discovery.zen.fd][c628agv-master-9200] [master] restarting fault detection against master [{c810wem-master-9200}{GtZClXtTR-2iOz58_VFC7A}{10.204.101.86}{10.204.101.86:9300}{data=false, appserver_instance=es-WLN-TypeAhead-Lab_2.4.2.2_9400, master=true}], reason [new cluster state received and we are monitoring the wrong master [null]]

o be in VALID state...
[2017-10-01T19:16:12.292Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:13.295Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:14.298Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:15.301Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:16.304Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:17.308Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...
[2017-10-01T19:16:18.311Z][INFO][com.trgr.elasticsearch.plugin.security.SecurityBootstrapService][c628agv-master-9200] Waiting for Cluster to be in VALID state...


(Mark Walkom) #2

What are all the settings that begin with tr.?


(Ghanshyam Singh) #3

these are our customized security setting instead of Shild


(Christian Dahlqvist) #4

If you have a completely custom security implementation in place that could impact the traffic within the cluster, I suspect it may be difficult for anyone here to be able to help out.

On another probably unrelated note, you should really have minimum_master_nodes set to 2 instead of 3 assuming you have 3 master eligible nodes.


(Ghanshyam Singh) #5

ok, I can try minimum_master_nodes set to 2 and see the status.


(Christian Dahlqvist) #6

It should allow the cluster to start up, but will probably not help the third master node join.


(Ghanshyam Singh) #7

cluster is up now but now all access has gone and now we are getting this error:

HTTP status code: 403: authentication failed because the cluster is still recovering the security index


(Christian Dahlqvist) #8

That I can not help you with as I know nothing about your security arrangement.


(Ghanshyam Singh) #9

thank you Christian!!


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.