Regular Split Brain Issue Running on 2 VMs in Windows Azure (Unicast Discovery)

Nariman_Haghighi · May 6, 2013, 2:36pm

We're running ES on 2 Azure VMs using unicast discovery and opening ports
9200 and 9300 on both nodes.

The initial start is always successful but we've started to notice split
brain pattern emerging a few hours into each deployment.

The configuration is as follows:

{"path":{"data":"F:\","work":"F:\"},"cluster":{"name":"FiveAces.Coffee.Web"},"node":{"name":"FiveAces.Coffee.Web_IN_0"},"discovery":{"zen":{"ping":{"multicast":{"enabled":false},"unicast":{"hosts":["10.241.238.26","10.241.182.18"]}}}}}

And this is the pattern that's happening almost regularly (which
subsequently leads to data loss after each side is restarted):

NODE1:

[2013-05-05 22:00:47,123][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: initializing ...
[2013-05-05 22:00:47,524][INFO ][plugins ]
[FiveAces.Coffee.Web_IN_1] loaded [], sites [head]
[2013-05-05 22:00:53,259][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: initialized
[2013-05-05 22:00:53,264][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: starting ...
[2013-05-05 22:00:53,506][INFO ][transport ]
[FiveAces.Coffee.Web_IN_1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.241.182.18:9300]}
[2013-05-05 22:01:00,800][INFO ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] master_left
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]],
reason [do not exists on master, act as master failure]
[2013-05-05 22:01:00,831][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_1] detected_master
[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],
added
{[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],},
reason: zen-disco-receive(from master
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]])
[2013-05-05 22:01:00,911][INFO ][discovery ]
[FiveAces.Coffee.Web_IN_1] FiveAces.Coffee.Web/lH-lp_jsQ4WhwwFJO3B-kA
[2013-05-05 22:01:00,912][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_1] master {new
[FiveAces.Coffee.Web_IN_1][lH-lp_jsQ4WhwwFJO3B-kA][inet[/10.241.182.18:9300]],
previous
[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]},
removed
{[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],},
reason: zen-disco-master_failed
([FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]])
[2013-05-05 22:01:02,352][INFO ][http ]
[FiveAces.Coffee.Web_IN_1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.241.182.18:9200]}
[2013-05-05 22:01:02,354][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: started
[2013-05-05 22:01:03,912][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] received cluster state from
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
which is also master but with an older cluster_state, telling
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
to rejoin the cluster
[2013-05-05 22:01:03,965][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] received cluster state from
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
which is also master but with an older cluster_state, telling
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
to rejoin the cluster
[2013-05-05 22:01:03,966][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] received cluster state from
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
which is also master but with an older cluster_state, telling
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
to rejoin the cluster
[2013-05-05 22:01:03,966][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] failed to send rejoin request to
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
org.elasticsearch.transport.SendRequestTransportException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:171)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:527)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:788)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:522)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
... 7 more
[2013-05-05 22:01:04,072][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] failed to send rejoin request to
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
org.elasticsearch.transport.SendRequestTransportException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:171)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:527)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:788)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:522)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
... 7 more
[2013-05-05 22:01:03,966][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] failed to send rejoin request to
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
org.elasticsearch.transport.SendRequestTransportException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:171)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:527)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:788)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:522)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
... 7 more

NODE2:

[2013-05-05 22:00:48,930][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: initializing ...
[2013-05-05 22:00:49,133][INFO ][plugins ]
[FiveAces.Coffee.Web_IN_0] loaded [], sites [head]
[2013-05-05 22:00:54,125][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: initialized
[2013-05-05 22:00:54,127][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: starting ...
[2013-05-05 22:00:54,310][INFO ][transport ]
[FiveAces.Coffee.Web_IN_0] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.241.238.26:9300]}
[2013-05-05 22:00:57,389][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_0] new_master
[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],
reason: zen-disco-join (elected_as_master)
[2013-05-05 22:00:57,406][INFO ][discovery ]
[FiveAces.Coffee.Web_IN_0] FiveAces.Coffee.Web/CVJT6uiFR4OEAzzXyRL_yQ
[2013-05-05 22:00:58,618][INFO ][http ]
[FiveAces.Coffee.Web_IN_0] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.241.238.26:9200]}
[2013-05-05 22:00:58,619][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: started
[2013-05-05 22:01:00,532][INFO ][gateway ]
[FiveAces.Coffee.Web_IN_0] recovered [1] indices into cluster_state
[2013-05-05 22:01:00,794][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_0] added
{[FiveAces.Coffee.Web_IN_1][lH-lp_jsQ4WhwwFJO3B-kA][inet[/10.241.182.18:9300]],},
reason: zen-disco-receive(join from
node[[FiveAces.Coffee.Web_IN_1][lH-lp_jsQ4WhwwFJO3B-kA][inet[/10.241.182.18:9300]]])

Could this be tied to the transport module using a bind range of 9300-9400
by default? Wondering if I should give it a narrow range of one or two
ports that I can then open up between the two VMs.

The data loss that results from this is unacceptable, any suggestions on
how to avoid the split brain scenario (other than moving to 3 nodes) would
be appreciated.

Regards,
N.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

imdhmd · May 7, 2013, 4:52am

We use elasticsearch-zookeeper plugin to address the split brain
issue: GitHub - sonian/elasticsearch-zookeeper
This helps avoiding split-brain because, the master election process is
now externalised to zookeeper service.

On Monday, May 6, 2013 8:06:58 PM UTC+5:30, Nariman Haghighi wrote:

We're running ES on 2 Azure VMs using unicast discovery and opening ports
9200 and 9300 on both nodes.

The initial start is always successful but we've started to notice split
brain pattern emerging a few hours into each deployment.

The configuration is as follows:

{"path":{"data":"F:\","work":"F:\"},"cluster":{"name":"FiveAces.Coffee.Web"},"node":{"name":"FiveAces.Coffee.Web_IN_0"},"discovery":{"zen":{"ping":{"multicast":{"enabled":false},"unicast":{"hosts":["10.241.238.26","10.241.182.18"]}}}}}

And this is the pattern that's happening almost regularly (which
subsequently leads to data loss after each side is restarted):

NODE1:

[2013-05-05 22:00:47,123][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: initializing ...
[2013-05-05 22:00:47,524][INFO ][plugins ]
[FiveAces.Coffee.Web_IN_1] loaded , sites [head]
[2013-05-05 22:00:53,259][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: initialized
[2013-05-05 22:00:53,264][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: starting ...
[2013-05-05 22:00:53,506][INFO ][transport ]
[FiveAces.Coffee.Web_IN_1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.241.182.18:9300]}
[2013-05-05 22:01:00,800][INFO ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] master_left
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]],
reason [do not exists on master, act as master failure]
[2013-05-05 22:01:00,831][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_1] detected_master
[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],
added
{[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],},
reason: zen-disco-receive(from master
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]])
[2013-05-05 22:01:00,911][INFO ][discovery ]
[FiveAces.Coffee.Web_IN_1] FiveAces.Coffee.Web/lH-lp_jsQ4WhwwFJO3B-kA
[2013-05-05 22:01:00,912][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_1] master {new
[FiveAces.Coffee.Web_IN_1][lH-lp_jsQ4WhwwFJO3B-kA][inet[/10.241.182.18:9300]],
previous
[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]},
removed
{[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],},
reason: zen-disco-master_failed
([FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]])
[2013-05-05 22:01:02,352][INFO ][http ]
[FiveAces.Coffee.Web_IN_1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.241.182.18:9200]}
[2013-05-05 22:01:02,354][INFO ][node ]
[FiveAces.Coffee.Web_IN_1] {0.90.0}[2164]: started
[2013-05-05 22:01:03,912][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] received cluster state from
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
which is also master but with an older cluster_state, telling
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
to rejoin the cluster
[2013-05-05 22:01:03,965][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] received cluster state from
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
which is also master but with an older cluster_state, telling
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
to rejoin the cluster
[2013-05-05 22:01:03,966][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] received cluster state from
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
which is also master but with an older cluster_state, telling
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
to rejoin the cluster
[2013-05-05 22:01:03,966][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] failed to send rejoin request to
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
org.elasticsearch.transport.SendRequestTransportException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:171)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:527)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:788)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:522)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
... 7 more
[2013-05-05 22:01:04,072][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] failed to send rejoin request to
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
org.elasticsearch.transport.SendRequestTransportException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:171)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:527)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:788)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:522)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
... 7 more
[2013-05-05 22:01:03,966][WARN ][discovery.zen ]
[FiveAces.Coffee.Web_IN_1] failed to send rejoin request to
[[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]]]
org.elasticsearch.transport.SendRequestTransportException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]][discovery/zen/rejoin]
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:199)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:171)
at
org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:527)
at
org.elasticsearch.cluster.service.InternalClusterService$2.run(InternalClusterService.java:229)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:95)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[FiveAces.Coffee.Web_IN_0][inet[/10.241.238.26:9300]] Node not connected
at
org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:788)
at
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:522)
at
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
... 7 more

NODE2:

[2013-05-05 22:00:48,930][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: initializing ...
[2013-05-05 22:00:49,133][INFO ][plugins ]
[FiveAces.Coffee.Web_IN_0] loaded , sites [head]
[2013-05-05 22:00:54,125][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: initialized
[2013-05-05 22:00:54,127][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: starting ...
[2013-05-05 22:00:54,310][INFO ][transport ]
[FiveAces.Coffee.Web_IN_0] bound_address {inet[/0:0:0:0:0:0:0:0:9300]},
publish_address {inet[/10.241.238.26:9300]}
[2013-05-05 22:00:57,389][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_0] new_master
[FiveAces.Coffee.Web_IN_0][CVJT6uiFR4OEAzzXyRL_yQ][inet[/10.241.238.26:9300]],
reason: zen-disco-join (elected_as_master)
[2013-05-05 22:00:57,406][INFO ][discovery ]
[FiveAces.Coffee.Web_IN_0] FiveAces.Coffee.Web/CVJT6uiFR4OEAzzXyRL_yQ
[2013-05-05 22:00:58,618][INFO ][http ]
[FiveAces.Coffee.Web_IN_0] bound_address {inet[/0:0:0:0:0:0:0:0:9200]},
publish_address {inet[/10.241.238.26:9200]}
[2013-05-05 22:00:58,619][INFO ][node ]
[FiveAces.Coffee.Web_IN_0] {0.90.0}[2640]: started
[2013-05-05 22:01:00,532][INFO ][gateway ]
[FiveAces.Coffee.Web_IN_0] recovered [1] indices into cluster_state
[2013-05-05 22:01:00,794][INFO ][cluster.service ]
[FiveAces.Coffee.Web_IN_0] added
{[FiveAces.Coffee.Web_IN_1][lH-lp_jsQ4WhwwFJO3B-kA][inet[/10.241.182.18:9300]],},
reason: zen-disco-receive(join from
node[[FiveAces.Coffee.Web_IN_1][lH-lp_jsQ4WhwwFJO3B-kA][inet[/10.241.182.18:9300]]])

Could this be tied to the transport module using a bind range of 9300-9400
by default? Wondering if I should give it a narrow range of one or two
ports that I can then open up between the two VMs.

The data loss that results from this is unacceptable, any suggestions on
how to avoid the split brain scenario (other than moving to 3 nodes) would
be appreciated.

Regards,
N.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Cluster is broken Elasticsearch	10	661	July 6, 2017
Split brain? Elasticsearch	8	551	July 6, 2017
Split brain due to 'on the fence' network partition Elasticsearch	5	766	July 6, 2017
Split brain problem on multi-node-single-machine installation Elasticsearch	3	540	July 6, 2017
Network outage keeps split brain status (no recovery by ES) (was issue #5144) Elasticsearch	7	1114	July 6, 2017

Regular Split Brain Issue Running on 2 VMs in Windows Azure (Unicast Discovery)

Related topics