How to make tribe nodes resilient on network partitioning among clusters


(Jae) #1

Hi pros

I am running two Elasticsearch clusters and each of them is running in the different data center. I set up tribe nodes in each data center for cross data center search and they are running well. Let's say we have dc1 and dc2. cluster1 is running in dc1 with 3 masters and a bunch of data nodes. Similarly, cluster2 is running in dc2 with 3 masters and a bunch of data nodes also.

I set up three tribe nodes in cluster1 to connect to both cluster1 and cluster2. Similarly, I set up three tribe nodes in cluster2 to connect to both cluster1 and cluster2.

However, I observed all tribe nodes stopped working on network partitioning between two data centers. I am attaching sampled error logs regarding network partitioning. I was expecting the tribe node should've worked as the isolated search node in each data center but I was wrong. Is there any way to make tribe nodes resilient on network partitioning?

[2016-03-17 17:41:13,153][WARN ][discovery.zen ] [node1/cluster2] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout),
[2016-03-17 17:41:13,154][INFO ][cluster.service ] [node1/cluster2] removed {[node2][sM5stptOQJ2gw7RhY5gfUw][node2][inet[/IP2:9300]]{data=false, rack=as11, master=true},}, reason: zen-disco-master_failed ([node2][sM5stptOQJ2gw7RhY5gfUw][node2][inet[/IP2:9300]]{data=false, rack=as11, master=true})
[2016-03-17 17:41:13,157][INFO ][tribe ] [node1] [cluster2] removing node [[node2][sM5stptOQJ2gw7RhY5gfUw][node2][inet[/IP2:9300]]{tribe.name=cluster2, data=false, rack=as11, master=true}]
[2016-03-17 17:41:13,162][INFO ][cluster.service ] [node1] removed {[node2][sM5stptOQJ2gw7RhY5gfUw][node2][inet[/IP1:9300]]{tribe.name=cluster2, data=false, rack=as11, master=true},}, reason: cluster event from cluster21, zen-disco-master_failed ([node21][sM5stptOQJ2gw7RhY5gfUw][node2][inet[/IP2:9300]]{data=false, rack=as11, master=true})
[2016-03-17 17:41:19,404][WARN ][discovery.zen.ping.unicast] [node1/cluster2] failed to send ping to [[#zen_unicast_1#][node1][inet[node2/IP29300]]]


(system) #2