Hi all,
I have been having some strange occurrences using elasticsearch on aws.
The setup is three nodes each with the setting of:
cluster.name: <clustername>
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled : false
discovery.type : ec2
discovery.ec2.ping_timeout : 30s
discovery.ec2.groups: <group>
cloud.aws.region : <region>
action.disable_delete_all_indices : true
discovery.zen.minimum_master_nodes : 2
I have witnessed two occurrences of the following:
Given 3 nodes A, B, C. Which are all in the same availability zone.
- To start with all nodes are connected in the cluster. A is the master.
- For some reason, node A and B cannot talk to each other. but both
can still talk to C and C can talk to A and B i.e. a 'on the
fence' network partition as C can still see all:
A:[2013-11-17 20:23:28,257][INFO ][cluster.service ] [A]
removed {[B][sUv4amcFSdmaDAVDa7bUVg][inet[/<ipaddress>:9300]],},
reason: zen-disco-node_failed([B][sUv4amcFSdmaDAVDa7bUVg][inet[/<
ipaddress>:9300]]), reason failed to ping, tried [3] times, each with
maximum [30s] timeout
- B:*[2013-11-17 20:25:27,543][INFO ][discovery.ec2 ] [B]
master_left [[A][O25rauSQR7utohD0jg4RQw][inet[/<ipaddress>:9300]]],
reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2013-11-17 20:25:27,547][INFO ][cluster.service ] [B] master
{new [B][sUv4amcFSdmaDAVDa7bUVg][inet[/<ipaddress>:9300]], previous [
A][O25rauSQR7utohD0jg4RQw][inet[/<ipaddress>:9300]]}, removed {[A
][O25rauSQR7utohD0jg4RQw][inet[/<ipaddress>:9300]],}, reason:
zen-disco-master_failed ([A][O25rauSQR7utohD0jg4RQw][inet[/<ipaddress:9300]])
C: [2013-11-17 20:23:28,256][INFO ][cluster.service ] [C]
removed {[B][sUv4amcFSdmaDAVDa7bUVg][inet[/<ipaddress>:9300]],},
reason: zen-disco-receive(from master [[A
][O25rauSQR7utohD0jg4RQw][inet[/<ipaddress>:9300]]])
As you can see B is now a new master but A has not been removed as a
master, because A can still see C so has the minimum master node
criteria satisfied.
When I ask B for it's state it responds stating that it is a master with
C.
When I ask A for it's state it responds stating that it is a master with
C.
When I ask C for it's state it responds with the same cluster state as A
.
This can be replicated by setting up three nodes (settings above), then
once a master has been established drop the connection between it and what
you assume will be the next master (usually the next node in the list after
the master). I used the following commands:
On the master node (A): iptables -A INPUT -s <node B ip address> -j DROP
On the next node (B): iptables -A INPUT -s <node A ip address> -j DROP
This should get you in the same state that I have witnessed in aws, once
two masters are established remove the iptables entries (running iptables
-F on A and B). From what I understand node discovery only happens when
a node is starting up or does not belong to a cluster, so as these nodes do
belong to a cluster they never discover each other.
I have tried this against versions 0.90.0, 0.90.4, 0.90.7 and
1.0.0.Beta1.zip of elasticsearch with no luck. I was using the
elasticsearch-cloud-aws plugin version 1.11.0 for elasticsearch version
0.90.0 and version 1.15.0 for elasticsearch versions 0.90.4, 0.90.7 and
1.0.0.Beta1.
I do not want to have to set minimum master nodes to 3 as for this use case
I value availability.
Any help would be greatly appreciated.
Kind Regards,
Mark Tinsley
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.