New 2.1.1 cluster forms with errors [SOLVED]

(Kawika Ohumukini) #1

Node 1: Master/Data
Node 2: Master/ Data
Node 3: Master

Node 1 & 2 form a cluster, have all imported data, queries and Marvel look great.
Bringing up Node 3, Node 1 & 2 cluster/health shows 3 nodes, logs look good. Marvel says 3 nodes but doesn't show Node 3 in the Node list.

Problem is Node 3 logs do not show cluster messages like master found, cluster/health times out and it's generally unusable. It kind of looks like split-brain but only Node 3 thinks it is not in a cluster.

Thanks for any ideas to try.

(Mark Walkom) #2

What do your configs look like?

(Kawika Ohumukini) #3

Here's what they look like. comments show difference between the three machines. - Thanks "node1" # node2 and node3 # .12 and .13
action.destructive_requires_name: true
bootstrap.mlockall: true analytics
discovery.zen.minimum_master_nodes: 2 false 15s [ "", "", ""]
gateway.expected_nodes: 2
gateway.recover_after_nodes: 1
gateway.recover_after_time: 5m
http.cors.enabled: true
index.cache.field.expire: 5m
index.number_of_replicas: 1
index.number_of_shards: 5
indices.fielddata.cache.size: 3GB true # false for node3
node.master: true "/mnt/ssd/elasticsearch"

(Kawika Ohumukini) #4

Turned on log debug level on node3 The following set of messages just repeats.

[2015-12-26 20:11:53,687][DEBUG][transport.netty ] [node3] connected to node [{#zen_unicast_4#}{}{}]
[2015-12-26 20:11:53,687][DEBUG][transport.netty ] [node3] connected to node [{#zen_unicast_2#}{}{}]

...15 second pause...

[2015-12-26 20:22:08,702][DEBUG][transport.netty ] [node3] disconnecting from [{#zen_unicast_4#}{}{}] due to explicit disconnect call
[2015-12-26 20:22:08,702][DEBUG][discovery.zen ] [node3] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [{node1}{jrPBHHpuQgqrYlJvA5Qhcg}{}{}{master=true}], id[365], master [{node2}{5_5TIHtiRC-FvuqZzzYjTw}{}{}{master=true}], hasJoinedOnce [true], cluster_name[analytics]}
--> ping_response{node [{node2}{5_5TIHtiRC-FvuqZzzYjTw}{}{}{master=true}], id[364], master [{node2}{5_5TIHtiRC-FvuqZzzYjTw}{}{}{master=true}], hasJoinedOnce [true], cluster_name[analytics]}
[2015-12-26 20:22:08,702][DEBUG][transport.netty ] [node3] disconnecting from [{#zen_unicast_2#}{}{}] due to explicit disconnect call
[2015-12-26 20:22:08,707][DEBUG][discovery.zen.publish ] [node3] received diff for but don't have any local cluster state - requesting full state
[2015-12-26 20:22:08,812][DEBUG][cluster.service ] [node3] processing [finalize_join ({node2}{5_5TIHtiRC-FvuqZzzYjTw}{}{}{master=true})]: execute
[2015-12-26 20:22:08,812][DEBUG][discovery.zen ] [node3] no master node is set, despite of join request completing. retrying pings.
[2015-12-26 20:22:08,812][DEBUG][cluster.service ] [node3] processing [finalize_join ({node2}{5_5TIHtiRC-FvuqZzzYjTw}{}{}{master=true})]: took 0s no change in cluster_state
[2015-12-26 20:22:08,817][DEBUG][transport.netty ] [node3] connected to node [{#zen_unicast_2#}{}{}]
[2015-12-26 20:22:08,818][DEBUG][transport.netty ] [node3] connected to node [{#zen_unicast_4#}{}{}]

(Kawika Ohumukini) #5

I didn't install the license on the third node. Installed it and all is good.

(system) #6