AWS EC2 Discovery : masters nodes work but data nodes fail


(Mark Conlin) #1

My objective is to run a 6 node cluster on three instances in EC2.
I am placing one master-only and one data-only node on each instance (using the elastic ansible playbook).

The master nodes from each of the three instances all find each other without issue using EC2 discovery and form a cluster of three and elect a master.
The data nodes from the same instances fail on startup with the error below.

What have I tried

  • switching data nodes to explicit zen.unicast discovery via host names works
  • I can telnet on port 9301 from instance A->B without issue

REFERENCE:

java version - OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
es version - 2.1.0

data node elasticseach.yml

bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-cluster
discovery.ec2.groups: stage-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.type: ec2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
gateway.expected_nodes: 4
http.port: 9201
network.host: ec2:privateDns
node.data: true
node.master: false
transport.tcp.port: 9301
node.name: ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1

bootstrap.mlockall: false
cloud.aws.region: us-east
cluster.name: my-cluster
discovery.ec2.groups: stage-elasticsearch
discovery.ec2.host_type: private_dns
discovery.ec2.ping_timeout: 30s
discovery.type: ec2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
gateway.expected_nodes: 4
http.port: 9200
network.host: ec2:privateDns
node.data: false
node.master: true
transport.tcp.port: 9300
node.name: ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-master

The data nodes only are experiencing this error on startup:

[2016-03-02 15:45:06,246][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] initializing ...
[2016-03-02 15:45:06,679][INFO ][plugins ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] loaded [cloud-aws], sites [head]
[2016-03-02 15:45:06,710][INFO ][env ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [11.5gb], net total_space [14.6gb], spins? [no], types [ext4]
[2016-03-02 15:45:09,597][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] initialized
[2016-03-02 15:45:09,597][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] starting ...
[2016-03-02 15:45:09,678][INFO ][transport ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1/xxx-xxx-xx-xxx:9301}, bound_addresses {xxx-xxx-xx-xxx:9301}
[2016-03-02 15:45:09,687][INFO ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] my-cluster/PNI6WAmzSYGgZcX2HsqenA
[2016-03-02 15:45:09,701][WARN ][com.amazonaws.jmx.SdkMBeanRegistrySupport]
java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "findMBeanServer")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)

.... edited for length .....

ccess$5000(ZenDiscovery.java:75)

at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)

[2016-03-02 15:45:09,703][WARN ][com.amazonaws.metrics.AwsSdkMetrics]
java.security.AccessControlException: access denied ("javax.management.MBeanServerPermission" "findMBeanServer")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:372)
at java.security.AccessController.checkPermission(AccessController.java:559)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)

.... sniped for length .....

ss$5000(ZenDiscovery.java:75)

at org.elasticsearch.discovery.zen.ZenDiscovery$JoinThreadControl$1.run(ZenDiscovery.java:1236)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2016-03-02 15:45:39,688][WARN ][discovery ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] waited for 30s and no initial state was set by the discovery
[2016-03-02 15:45:39,698][INFO ][http ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] publish_address {ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1/xxx-xxx-xx-xxx:9201}, bound_addresses {xxx-xxx-xx-xxx:9201}
[2016-03-02 15:45:39,699][INFO ][node ] [ip-xxx-xxx-xx-xxx.vpc.fakedomain.com-data1] started


(system) #2