Hi,
I have a bit of an issue.
I have ES running on a cluster at the office consisting of 4 nodes. (2 Data and 3 master eligible)
I am using Graylog as the data collectors and ES to store the information.
This is my current Set up
Data Nodes (Node Names):
Gray-003
Gray-006
Master Eligible (Node Names):
Gray-003 (Data and Master)
Gray-004 - (Master only no data)
Gray-005 - (Master only no data)
My problem is that all the other ES can see Gray-006 . . but every few hours, Gray-006 doesn't see the other ES Servers Weird right?
(This picture shoes KOPF from Gray-003. It can see Gray-006 and see its stats)
However, KOPF on Gray-006 says its offline (192.168.32.73)
After a few minutes or so it connects back up and KOPF identifies itself again.
Now all other ES servers are at a Data centre, the data only node (Gray-006) is at our office and is connected over a leased line. The line has no issues what so ever as we run all the office file servers etc from the DC with no issues at all.
When i cycle the deflector to create a indice (As i have 4 primary and 2 replicas) they get evenly spreaded out over the 2 data nodes. However, as i have a automatic rotation every 24 hours, when it does so, all the primary shards get allocated to the Gray-003 node and all replicas reside on Gray-006.
I have tried upping the zen ping timeout on Gray-006 but it still d/c's from the cluster every few hours and when it comes back up, i can't search the ES data from Graylog as it says its out of Sync.
On Gray-006 ES Logs, i keep getting:
[2016-01-18 10:00:30,131][DEBUG][action.admin.cluster.node.stats] [ess-ukh-gray-006_DATA] failed to execute on node [_ahdlW2dTjiMT2KHhGMprA]
org.elasticsearch.transport.NodeDisconnectedException: [gray004][inet[/192.168.16.131:9350]][cluster:monitor/nodes/stats[n]] disconnected
Any help is greatly appreciated,
Thank you,
P.S
I also keep getting transport disconnect errors when Gray-006 gets removed from the cluster
[2016-01-18 11:14:47,583][INFO ][discovery.zen ] [ess-ukh-gray-006_DATA] master_left [[ess-lon-gray-004s][EBo7CIrURj-Su2uV3vozKw][ess-lon-gray-004][inet[/192.168.16.131:9300]]{data=false, master=true}], reason [transport disconnected]
[2016-01-18 11:14:47,585][WARN ][discovery.zen ] [ess-ukh-gray-006_DATA] master left (reason = transport disconnected), current nodes: {[ess-ukh-gray-006_DATA][HJBoxbBKRiibKVnJe3S2qA][ess-ukh-gray-006][inet[/192.168.32.73:9300]]{master=false},[gray-006_Data][frkMUcP9Sma-VsJv2mPvQQ][ess-ukh-gray-006][inet[/192.168.32.73:9350]]{client=true, data=false, master=false},[gray004][_ahdlW2dTjiMT2KHhGMprA][ess-lon-gray-004][inet[/192.168.16.131:9350]]{client=true, data=false, master=false},[gray-003][EwImG6QwS4Clqmsaw3snXA][ess-lon-gray-003][inet[/192.168.16.130:9350]]{client=true, data=false, master=false},[ess-lon-gray-005][lKqyqfY6SzeYg3ejXsDwcA][ess-lon-gray-005][inet[/192.168.16.132:9300]]{data=false, master=true},[ess-lon-gray-003_master][KV90ufrQQc-g1aPV2RoyGA][ess-lon-gray-003][inet[/192.168.16.130:9300]]{master=true},}
[2016-01-18 11:14:47,585][INFO ][cluster.service ] [ess-ukh-gray-006_DATA] removed {[ess-lon-gray-004s][EBo7CIrURj-Su2uV3vozKw][ess-lon-gray-004][inet[/192.168.16.131:9300]]{data=false, master=true},}, reason: zen-disco-master_failed ([ess-lon-gray-004s][EBo7CIrURj-Su2uV3vozKw][ess-lon-gray-004][inet[/192.168.16.131:9300]]{data=false, master=true})
[2016-01-18 11:14:50,887][DEBUG][action.admin.cluster.state] [ess-ukh-gray-006_DATA] no known master node, scheduling a retry
[2016-01-18 11:14:50,902][DEBUG][action.admin.cluster.state] [ess-ukh-gray-006_DATA] no known master node, scheduling a retry
[2016-01-18 11:14:50,903][DEBUG][action.admin.indices.get ] [ess-ukh-gray-006_DATA] no known master node, scheduling a retry
[2016-01-18 11:14:50,907][DEBUG][action.admin.cluster.health] [ess-ukh-gray-006_DATA] no known master node, scheduling a retry
[2016-01-18 11:15:02,623][INFO ][cluster.service ] [ess-ukh-gray-006_DATA] detected_master [ess-lon-gray-004s][
Thanks