I run a 2-node test/reference cluster on EC2, with the 2 nodes in
different availability zones (east 1a and east 1b). Under 0.15.2, this
worked fine.
I am currently upgrading to 0.16.2 (no other changes), and the
discovery now appears only to work between machines on the same
availability zone.
My YML configuration is very simple (just pasted below below since
it's so short):
cluster:
name: infinite-aws
discovery:
type: ec2
cloud:
aws:
access_key:
secret_key:
bootstrap:
mlockall: true
(I tried adding "ec2:availability_zones: us-east-1a,us-
east-1b,us-east-1c", but this didn't make a difference - not
surprisingly since the discovery phase works)
Node A (east 1a) and Node B (east 1b) can both telnet to each other's
private IP addresses via telnet. (But they are on different Class C
subnets, so any broadcasts wouldn't work). The log files indicate that
the 2 nodes find each other using the discovery.ec2 mechanism.
There are no log messages (even on discovery and transport debug)
between the correct list of EC2 nodes being returned (as noted above,
I've confirmed by hand/telnet I can connect to the node:ports listed),
and the "ping responses: {none}" message after which the node declares
itself master.
The only other interesting thing to happen in the log are a set of
"received ping response with no matching id [1]" messages across all
nodes after one node declares itself master. (I saw the other thread
where the problem was the nodes binding themselves to an IPv6 address,
but here the transport log messages indicate the nodes are correctly
binding themselves to 0.0.0.0:9300)
For both nodes, the log confirms they are running elasticsearch/
0.16.2, and (although the log doesn't confirm versioning for the AWS
plugin) I can see the line "Downloading plugin [...] cloud-
aws-0.16.2.zip" on the console of both nodes.
When I started up a Node C on east 1b, Nodes B and C found each other
as expected.
So is EC2 discovery supposed to work across availability sub-zones?
(Or was I taking advantage of an unintended feature in 0.15.2 - though
the presence of discovery.ec2.availability_zones suggests not)?
If so, has anyone seen it work?
I can provide further details as needed, ie if there's not a simple
explanation/the above is too unclear for a quick diagnostic.