[Solved] ElasticSearch - ec2 : Discovery doesn't work with cloud-aws plugin

Hello,

I have some problems with elasticsearch cloud-aws plugin which doesn't see any other instances.

Here is the configuration i put :

cluster.name: test_tferdinand
node.name: ip-10-203-13-175
network.host: 0.0.0.0

cloud:
    aws:
        region: eu-west-1
discovery:
    type: ec2

My instance is using the following IAM Role :

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ec2:Describe*",
      "Resource": "*"
    }
  ]
} 

and the following inbound rules :

outbound rule :

Here are the starting logs :
[2016-10-31 08:29:39,938][INFO ][node ] [ip-10-203-13-12 ] version[2.4.1], pid[3311], build[c67dc32/2016-09-27T18:57:55Z]
[2016-10-31 08:29:39,939][INFO ][node ] [ip-10-203-13-12] initializing ...
[2016-10-31 08:29:40,895][INFO ][plugins ] [ip-10-203-13-12] modules [lang-groovy, reindex, lang-expression], pl ugins [cloud-aws], sites
[2016-10-31 08:29:40,927][INFO ][env ] [ip-10-203-13-12] using [1] data paths, mounts [[/ (/dev/xvda1)]], ne t usable_space [6.5gb], net total_space [7.7gb], spins? [no], types [ext4]
[2016-10-31 08:29:40,927][INFO ][env ] [ip-10-203-13-12] heap size [1015.6mb], compressed ordinary object po inters [true]
[2016-10-31 08:29:40,928][WARN ][env ] [ip-10-203-13-12] max file descriptors [4096] for elasticsearch proce ss likely too low, consider increasing to at least [65536]
[2016-10-31 08:29:43,374][INFO ][node ] [ip-10-203-13-12] initialized
[2016-10-31 08:29:43,374][INFO ][node ] [ip-10-203-13-12] starting ...
[2016-10-31 08:29:43,438][INFO ][transport ] [ip-10-203-13-12] publish_address {10.203.13.12:9300}, bound_addresse s {10.203.13.12:9300}
[2016-10-31 08:29:43,442][INFO ][discovery ] [ip-10-203-13-12] test_tferdinand/M6F_ibLkT46OTQRkZXXLoA
[2016-10-31 08:29:49,922][INFO ][cluster.service ] [ip-10-203-13-12] new_master {ip-10-203-13-12}{M6F_ibLkT46OTQRkZXXLoA }{10.203.13.12}{10.203.13.12:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-10-31 08:29:49,957][INFO ][http ] [ip-10-203-13-12] publish_address {10.203.13.12:9200}, bound_addresse s {10.203.13.12:9200}
[2016-10-31 08:29:49,957][INFO ][node ] [ip-10-203-13-12] started [2016-10-31 08:29:49,962][INFO ][gateway ] [ip-10-203-13-12] recovered [0] indices into cluster_state

Can someone help me before i become crazy? :slight_smile:
Thanks

You have a wrong indentation for the region.
Can you check that?

Hello,

Thanks for your reply :slight_smile:

I have juste failed my copy / paste :cry:

Indentation is fine in my elasticsearch.yml file.

i change my first post

Can you check if it works well using key/secret instead of IAM roles?
We fixed an issue in 5.0 which might be also in 2.4.1 so that's why I'm asking.

Thanks!

I tried, but still have the same issue.

Here are the starting logs :

[2016-10-31 12:20:42,032][INFO ][node                     ] [ip-10-203-13-175] initializing ...
[2016-10-31 12:20:43,040][INFO ][plugins                  ] [ip-10-203-13-175] modules [lang-groovy, reindex, lang-expression], plugins [cloud-aws], sites []
[2016-10-31 12:20:43,077][INFO ][env                      ] [ip-10-203-13-175] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [6.5gb], net total_space [7.7gb], spins? [no], types [ext4]
[2016-10-31 12:20:43,085][INFO ][env                      ] [ip-10-203-13-175] heap size [1015.6mb], compressed ordinary object pointers [true]
[2016-10-31 12:20:43,085][WARN ][env                      ] [ip-10-203-13-175] max file descriptors [4096] for elasticsearch process likely too low, consider increasing to at least [65536]
[2016-10-31 12:20:44,595][DEBUG][com.amazonaws.AmazonWebServiceClient] Internal logging succesfully configured to commons logger: true
[2016-10-31 12:20:45,589][INFO ][node                     ] [ip-10-203-13-175] initialized
[2016-10-31 12:20:45,590][INFO ][node                     ] [ip-10-203-13-175] starting ...
[2016-10-31 12:20:45,650][INFO ][transport                ] [ip-10-203-13-175] publish_address {10.203.13.175:9300}, bound_addresses {[::]:9300}
[2016-10-31 12:20:45,654][INFO ][discovery                ] [ip-10-203-13-175] test_tferdinand/liaBfQ5VSwWL_E4_Ue-mEA
[2016-10-31 12:20:45,678][DEBUG][com.amazonaws.auth.AWSCredentialsProviderChain] Loading credentials from com.amazonaws.internal.StaticCredentialsProvider@41938a35
[2016-10-31 12:20:45,693][DEBUG][com.amazonaws.request    ] Sending Request: POST https://ec2.eu-west-1.amazonaws.com / Parameters: ({"Action":["DescribeInstances"],"Version":["2015-10-01"],"Filter.1.Name":["instance-state-name"],"Filter.1.Value.1":["running"],"Filter.1.Value.2":["pending"]}Headers: (User-Agent: aws-sdk-java/1.10.69 Linux/4.4.19-29.55.amzn1.x86_64 OpenJDK_64-Bit_Server_VM/24.111-b01/1.7.0_111, amz-sdk-invocation-id: c79fd64f-7885-43eb-ae97-ca8fa358e714, )
[2016-10-31 12:20:45,699][DEBUG][com.amazonaws.auth.AWS4Signer] AWS4 Canonical Request: '"POST
/

amz-sdk-invocation-id:c79fd64f-7885-43eb-ae97-ca8fa358e714
amz-sdk-retry:0/0/
host:ec2.eu-west-1.amazonaws.com
user-agent:aws-sdk-java/1.10.69 Linux/4.4.19-29.55.amzn1.x86_64 OpenJDK_64-Bit_Server_VM/24.111-b01/1.7.0_111
x-amz-date:20161031T122045Z

amz-sdk-invocation-id;amz-sdk-retry;host;user-agent;x-amz-date
150231f74850cde9dd65816532b383893b9fcfe8171a39f2147a60917617e3aa"
[2016-10-31 12:20:45,704][DEBUG][com.amazonaws.auth.AWS4Signer] AWS4 String to Sign: '"AWS4-HMAC-SHA256
20161031T122045Z
20161031/eu-west-1/ec2/aws4_request
aa46c421d75b99d6bd6ff7231caec91624be12fd251c8cbd3b1f18f644710cd6"
[2016-10-31 12:20:45,705][DEBUG][com.amazonaws.auth.AWS4Signer] Generating a new signing key as the signing key not available in the cache for the date 1477872000000
[2016-10-31 12:20:45,871][DEBUG][com.amazonaws.http.conn.ssl.SdkTLSSocketFactory] socket.getSupportedProtocols(): [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2], socket.getEnabledProtocols(): [TLSv1]
[2016-10-31 12:20:45,871][DEBUG][com.amazonaws.http.conn.ssl.SdkTLSSocketFactory] TLS protocol enabled for SSL handshake: [TLSv1.2, TLSv1.1, TLSv1]
[2016-10-31 12:20:45,872][DEBUG][com.amazonaws.http.conn.ssl.SdkTLSSocketFactory] connecting to ec2.eu-west-1.amazonaws.com/54.239.39.130:443
[2016-10-31 12:20:46,201][DEBUG][com.amazonaws.internal.SdkSSLSocket] created: ec2.eu-west-1.amazonaws.com/54.239.39.130:443
[2016-10-31 12:20:46,223][DEBUG][com.amazonaws.http.impl.client.SdkHttpClient] Attempt 1 to execute request
[2016-10-31 12:20:46,580][DEBUG][com.amazonaws.http.impl.client.SdkHttpClient] Connection can be kept alive for 60000 MILLISECONDS
[2016-10-31 12:20:46,593][DEBUG][com.amazonaws.requestId  ] x-amzn-RequestId: not available
[2016-10-31 12:20:47,389][DEBUG][com.amazonaws.request    ] Received successful response: 200, AWS Request ID: c6eb4776-99e9-4fc7-bb71-1c9099418c74
[2016-10-31 12:20:47,395][DEBUG][com.amazonaws.requestId  ] AWS Request ID: c6eb4776-99e9-4fc7-bb71-1c9099418c74
[2016-10-31 12:20:52,032][INFO ][cluster.service          ] [ip-10-203-13-175] new_master {ip-10-203-13-175}{liaBfQ5VSwWL_E4_Ue-mEA}{10.203.13.175}{10.203.13.175:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2016-10-31 12:20:52,053][INFO ][http                     ] [ip-10-203-13-175] publish_address {10.203.13.175:9200}, bound_addresses {[::]:9200}
[2016-10-31 12:20:52,053][INFO ][node                     ] [ip-10-203-13-175] started
[2016-10-31 12:20:52,073][INFO ][gateway                  ] [ip-10-203-13-175] recovered [0] indices into cluster_state

Apparently the plugin looks to work well with AWS API.

[2016-10-31 12:20:47,389][DEBUG][com.amazonaws.request    ] Received successful response: 200, AWS Request ID: c6eb4776-99e9-4fc7-bb71-1c9099418c74

May be you can change network.host: _ec2_ and see how it goes?

Another test you can try is to run without the AWS plugin and just define the unicast list of nodes manually and check if it's working or not.

If it's working then we can try to figure out what is happening with the plugin.
If not, then you probably have another problem (firewall...)

If we want to make sure what is happening you can also change the log level for org.apache to DEBUG so we can see what exactly is sent back by AWS to the plugin.

I finally got it to work,

Here is the final configuration i made :

cloud:
    aws:
        region: eu-west-1

discovery:
    type: ec2
    ec2:
        groups: [my_security_group]
        host_type: private_ip

network:
    host: _ec2:privateIpv4_

When i request "/_cluster/health", i see "number_of_nodes":3

I think AWS API is returning too much results without filtering these.

Thank you for your help, and sorry for inconvenience.