On our live cluster we have recently encountered some connectivity issues.
Our cluster is spread across two physical data centres with a dedicated
private link between the two, that is 'fairly' reliable. From the various
comments I've read so far though, a multi-data centre cluster is not
currently advised, so I'm anticipating that kind of response.
But anyway, below is the log from one of our 'coordinator' nodes (i.e.
data=false, master=true), which shows the node has failed its zen discovery
with other nodes in the cluster in the other data centre, with a reason of
'transport disconnected (with verified connect)'. The monitoring data for
our data centre link shows that no link outage occurred in the specified
time period, though there was some increase in traffic across the link.
Looking at the ES source code, and I have seen other posts that support
this, I would expect to see a different message if the reason for the
failed connectivity was due to the traffic increase - the ping request
would have timed out (failed to ping [{}], tried [{}] times, each with
maximum [{}]).
So my question is this:
what are the likely causes of a 'transport disconnect' in this situation?
And is my expectation of an ES cluster to work over this architecture a
naive one?
(Note, I've changed some of the ip addresses for security reasons, and
reformatted it to aid readability).
[2013-09-15 19:34:02,845][INFO ][cluster.service]
[live_SQLWOK11_coordinator]
removed
{
[live_SQLLIVE25][FDmaoEyDSni109A5nY_kcg][inet[/999.86.1.40:53028]]{datacentrename=London,
nodename=live_SQLLIVE25, master=false},
},
reason:
zen-disco-node_failed([live_SQLLIVE25][FDmaoEyDSni109A5nY_kcg][inet[/999.86.1.40:53028]]{datacentrename=London,
nodename=live_SQLLIVE25, master=false}),
reason transport disconnected (with verified connect)
[2013-09-15 19:34:03,140][INFO ][cluster.service]
[live_SQLWOK11_coordinator]
removed
{
[live_SQLLIVE24_loadbalancer][KgZ0hKRuRj6sIafa7eXQyA][inet[/999.86.1.38:54604]]{datacentrename=London,
data=false, nodename=live_SQLLIVE24_loadbalancer, master=false},
},
reason:
zen-disco-node_failed([live_SQLLIVE24_loadbalancer][KgZ0hKRuRj6sIafa7eXQyA][inet[/999.86.1.38:54604]]{datacentrename=London,
data=false, nodename=live_SQLLIVE24_loadbalancer, master=false}),
reason transport disconnected (with verified connect)
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.