Elasticsearch 2.0 aws-cloud network issue


(Liran Shani) #1

I use aws (VPC) for ES cluster.
After upgrading fron ES 1.7.2 to ES 2.0 i have network issues ->
[WARN ][transport.netty ] [es2-data.localdomain] exception caught on transport layer [[id: 0xd878a1ca, /10.111.112.87:38887 => /10.111.112.176:9300]], closing connection

There is no nodes dicovery

$: curl localhost:9200/_nodes/process?pretty
The problem is that the "transport_address" : "127.0.0.1:9300" is set to localhost in (ES 2.0)
where in ES 1.7.2 the transport_address is "transport_address" : "inet[/10.111.112.86:9300]"

My question is how to set correctly the network settings in ES 2.0 in order to use the private ip.


(David Pilato) #2

Read this: https://www.elastic.co/guide/en/elasticsearch/plugins/current/cloud-aws-discovery.html#cloud-aws-discovery-network-host


(Liran Shani) #3

I read this, not sure how to use the proper settings.


(David Pilato) #4

May be you missed the lines just above?

 network.host: _ec2_

Should work


(Liran Shani) #5

I tried this and got this exception:

[2015-11-24 07:07:09,820][ERROR][bootstrap ] Guice Exception: java.lang.IllegalArgumentException: No enum constant org.elasticsearch.discovery.ec2.AwsEc2UnicastHostsProvider.HostType.EC2

My settings elasticsearch,yml

cluster.name: ABC

node.name: ABCDEFGH.localdomain
node.master: true
node.data: true
node.max_local_storage_nodes: 1

index.mapper.dynamic: true
action.auto_create_index: true
action.disable_delete_all_indices: true

bootstrap.mlockall: true

http.port: 9200
network.host: _ec2 _

discovery.type: ec2

discovery.zen.ping.multicast.enabled: false

cloud.aws.access_key: ABCDEFGH
cloud.aws.secret_key: ABCDEFGH
cloud.aws.region: eu-west-1
discovery.ec2.groups: ABCDEFGH
discovery.ec2.availability_zones: ["eu-west-1a", "eu-west-1b"]

(David Pilato) #6

It sounds like a bug to me.

Do you have a full stack trace?

Could you try with _ec2: privateIp_?


(Liran Shani) #7

I tried with:
network.host: _ ec2:privateIp _

could not do:

ubuntu@es-data2:~$ curl localhost:9200/_nodes/process?pretty
curl: (7) Failed to connect to localhost port 9200: Connection refused

stacktrace:

[2015-11-24 08:03:18,819][INFO ][node ] [es2-data.localdomain] version[2.0.0], pid[6576], build[de54438/2015-10-22T08:09:48Z]
[2015-11-24 08:03:18,820][INFO ][node ] [es2-data.localdomain] initializing ...
[2015-11-24 08:03:19,087][INFO ][plugins ] [es2-data.localdomain] loaded [cloud-aws], sites [hq]
[2015-11-24 08:03:19,122][INFO ][env ] [es2-data.localdomain] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [4.2gb], net total_space [7.7gb], spins? [no], types [ext4]
[2015-11-24 08:03:21,722][INFO ][node ] [es2-data.localdomain] initialized
[2015-11-24 08:03:21,722][INFO ][node ] [es2-data.localdomain] starting ...
[2015-11-24 08:03:21,821][INFO ][transport ] [es2-data.localdomain] publish_address {10.111.112.87:9300}, bound_addresses {10.111.112.87:9300}
[2015-11-24 08:03:21,832][INFO ][discovery ] [es2-data.localdomain] ABCDEFGHtest/nIfqDAOYQ8WprHxFGeIxRQ
[2015-11-24 08:03:22,665][WARN ][transport.netty ] [es2-data.localdomain] exception caught on transport layer [[id: 0x192d25cb, /10.111.112.87:58884 => /10.111.112.73:9300]], closing connection
java.lang.NullPointerException
at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:206)
at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:201)
..
at java.lang.Thread.run(Thread.java:745)
[2015-11-24 08:03:22,666][WARN ][transport.netty ] [es2-data.localdomain] exception caught on transport layer [[id: 0xe7228172, /10.111.112.87:37220 => /10.111.112.226:9300]], closing connection
java.lang.NullPointerException
....
[2015-11-24 08:03:22,665][WARN ][discovery.zen.ping.unicast] [es2-data.localdomain] failed to send ping to [{#cloud-i-11e13da8-0}{10.111.112.121}{10.111.112.121:9300}]
RemoteTransportException[[es1-data.localdomain][10.111.112.121:9300][internal:discovery/zen/unicast]]; nested: IllegalStateException[received ping request while not started];
Caused by: java.lang.IllegalStateException: received ping request while not started

[2015-11-24 08:03:24,324][WARN ][discovery.zen.ping.unicast] [es2-data.localdomain] failed to send ping to [{#cloud-i-11e13da8-0}{10.111.112.121}{10.111.112.121:9300}]
RemoteTransportException[[es1-data.localdomain][10.111.112.121:9300][internal:discovery/zen/unicast]]; nested: IllegalStateException[received ping request while not started];
Caused by: java.lang.IllegalStateException: received ping request while not started
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.handlePingRequest(UnicastZenPing.java:478)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.access$2400(UnicastZenPing.java:64)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$UnicastPingRequestHandler.messageReceived(UnicastZenPing.java:503)
...
[2015-11-24 08:03:27,517][INFO ][cluster.service ] [es2-data.localdomain] new_master {es2-data.localdomain}{nIfqDAOYQ8WprHxFGeIxRQ}{10.111.112.87}{10.111.112.87:9300}{max_local_storage_nodes=1, master=true}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2015-11-24 08:03:27,544][INFO ][http ] [es2-data.localdomain] publish_address {10.111.112.87:9200}, bound_addresses {10.111.112.87:9200}
[2015-11-24 08:03:27,544][INFO ][node ] [es2-data.localdomain] started
[2015-11-24 08:03:27,552][INFO ][gateway ] [es2-data.localdomain] recovered [0] indices into cluster_state
[2015-11-24 08:03:58,215][INFO ][cluster.service ] [es2-data.localdomain] added {{es1-data.localdomain}{K70lDLMYSe2Ok15X_m9BUQ}{10.111.112.121}{10.111.112.121:9300}{max_local_storage_nodes=1, master=true},}, reason: zen-disco-join(join from node[{es1-data.localdomain}{K70lDLMYSe2Ok15X_m9BUQ}{10.111.112.121}{10.111.112.121:9300}{max_local_storage_nodes=1, master=true}])


(David Pilato) #8

Everything looks fine here. It bounds to 10.111.112.87 (ec2 private IP) so you need to run curl 10.111.112.87:9200.

But could you run the same again with _ec2_ and paste the stack trace?


(Liran Shani) #9

with _ ec2 _ i had earlier a typeo,

I am using now network.host: _ ec2 _ and see that the transport_address is ok (the 2 nodes creates a cluster) but in the log there are still many exceptions:

[2015-11-24 09:53:21,778][INFO ][node                     ] [es2-data.localdomain] version[2.0.0], pid[9509], build[de54438/2015-10-22T08:09:48Z]
[2015-11-24 09:53:21,779][INFO ][node                     ] [es2-data.localdomain] initializing ...
[2015-11-24 09:53:22,073][INFO ][plugins                  ] [es2-data.localdomain] loaded [cloud-aws], sites [hq]
[2015-11-24 09:53:22,110][INFO ][env                      ] [es2-data.localdomain] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [4.2gb], net total_space [7.7gb], spins? [no], types [ext4]
[2015-11-24 09:53:24,687][INFO ][node                     ] [es2-data.localdomain] initialized
[2015-11-24 09:53:24,687][INFO ][node                     ] [es2-data.localdomain] starting ...
[2015-11-24 09:53:24,794][INFO ][transport                ] [es2-data.localdomain] publish_address {10.111.112.87:9300}, bound_addresses {10.111.112.87:9300}
[2015-11-24 09:53:24,803][INFO ][discovery                ] [es2-data.localdomain] sisense-monitoring-cluster-test/9xDZmZXYRQOg2il-XOQ0KQ
**[2015-11-24 09:53:25,693][WARN ][transport.netty          ] [es2-data.localdomain] exception caught on transport layer [[id: 0x75753f9e, /10.111.112.87:59382 => /10.111.112.73:9300]], closing connection**
**java.lang.NullPointerException**
	at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:206)
	at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:201)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2015-11-24 09:53:25,695][WARN ][transport.netty          ] [es2-data.localdomain] exception caught on transport layer [[id: 0xc649b4e4, /10.111.112.87:55535 => /10.111.112.134:9300]], closing connection
java.lang.NullPointerException
	at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:206)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2015-11-24 09:53:25,694][WARN ][transport.netty          ] [es2-data.localdomain] exception caught on transport layer [[id: 0xfc3e4419, /10.111.112.87:37718 => /10.111.112.226:9300]], closing connection
java.lang.NullPointerException
	at org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:206)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2015-11-24 09:53:25,694][WARN ][discovery.zen.ping.unicast] [es2-data.localdomain] failed to send ping to [{#cloud-i-11e13da8-0}{10.111.112.121}{10.111.112.121:9300}]
RemoteTransportException[[es1-data.localdomain][10.111.112.121:9300][internal:discovery/zen/unicast]]; nested: IllegalStateException[received ping request while not started];
Caused by: java.lang.IllegalStateException: received ping request while not started
	at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.handlePingRequest(UnicastZenPing.java:478)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2015-11-24 09:53:25,698][WARN ][transport.netty          ] [es2-data.localdomain] exception caught on transport layer [[id: 0x91df9511, /10.111.112.87:46313 => /10.111.112.86:9300]], closing connection
java.lang.NullPointerException

	at java.lang.Thread.run(Thread.java:745)
[2015-11-24 09:53:30,632][INFO ][cluster.service          ] [es2-data.localdomain] new_master {es2-data.localdomain}{9xDZmZXYRQOg2il-XOQ0KQ}{10.111.112.87}{10.111.112.87:9300}{max_local_storage_nodes=1}, reason: zen-disco-join(elected_as_master, [0] joins received)
[2015-11-24 09:53:30,660][INFO ][http                     ] [es2-data.localdomain] publish_address {10.111.112.87:9200}, bound_addresses {10.111.112.87:9200}
[2015-11-24 09:53:30,660][INFO ][node                     ] [es2-data.localdomain] started
[2015-11-24 09:53:30,665][INFO ][gateway                  ] [es2-data.localdomain] recovered [0] indices into cluster_state
[2015-11-24 09:54:01,004][INFO ][cluster.service          ] [es2-data.localdomain] added {{es1-data.localdomain}{TfgeBr1fR3StfHQdpXiPbQ}{10.111.112.121}{10.111.112.121:9300}{max_local_storage_nodes=1},}, reason: zen-disco-join(join from node[{es1-data.localdomain}{TfgeBr1fR3StfHQdpXiPbQ}{10.111.112.121}{10.111.112.121:9300}{max_local_storage_nodes=1}])

(David Pilato) #10

You are mixing node versions apparently.

Some "older" nodes are trying to communicate with this new one.

See https://github.com/elastic/elasticsearch/issues/14400


(Liran Shani) #11

I will check it.
Indeed i am trying to upgrade from 1.7 to 2.0. There might be running ES v1.7
Will update...


(system) #12