Hello All,
I am using the AWS plugin (ver 2.7.1) with ES 1.7.5 in Amazon for cluster discovery.
All is well most of the time.
Sometimes however it simply does not work and and there is no indication why.
Scenario 1.
New cluster is launched using a script, the instances get the relevant role assigned.
When the cluster stands up, you get 503. Then these instances are terminated, and the cluster is launched again using the same script. Second/Third time the cluster comes up and all is well. No changes made. Sometimes this succeeds the first time.
Scenario 2.
A change was made to elasticsearch.yml on each node of a running cluster. The cluster was launched with the script mentioned above. After having restarted each node one by one while keeping an eye on cluster status.
One node is just unable to discover the rest of the cluster using the AWS plugin. Again, this is a node that was joined into this cluster using the plugin before.
Has anyone come across this before?
Thanks in advance,
Extarct from the log:
[2016-08-01 12:27:52,888][INFO ][node ] [ip-10-20-xxx-xxx] initializing ...
[2016-08-01 12:27:53,055][INFO ][plugins ] [ip-10-20-xxx-xxx] loaded [cloud-aws], sites []
[2016-08-01 12:27:53,107][INFO ][env ] [ip-10-20-xxx-xxx] using [1] data paths, mounts [[/ (/dev/xvda1)]], net usable_space [119.1gb], net total_space [125.8gb], types [ext4]
[2016-08-01 12:27:56,716][INFO ][node ] [ip-10-20-xxx-xxx] initialized
[2016-08-01 12:27:56,716][INFO ][node ] [ip-10-20-xxx-xxx] starting ...
[2016-08-01 12:27:56,842][INFO ][transport ] [ip-10-20-xxx-xxx] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.20.xxx.xxx:9300]}
[2016-08-01 12:27:56,899][INFO ][discovery ] [ip-10-20-xxx-xxx] thisEsCluster/HOO-XTFrSduT5uirb-xBGA
[2016-08-01 12:28:26,899][WARN ][discovery ] [ip-10-20-xxx-xxx] waited for 30s and no initial state was set by the discovery
[2016-08-01 12:28:26,904][INFO ][http ] [ip-10-20-xxx-xxx] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.20.xxx.xxx:9200]}
[2016-08-01 12:28:26,904][INFO ][node ] [ip-10-20-xxx-xxx] started
[2016-08-01 12:39:53,738][DEBUG][action.admin.cluster.health] [ip-10-20-xxx-xxx] no known master node, scheduling a retry
[2016-08-01 12:40:23,740][DEBUG][action.admin.cluster.health] [ip-10-20-xxx-xxx] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
[2016-08-01 12:40:23,934][DEBUG][action.admin.indices.get ] [ip-10-20-xxx-xxx] no known master node, scheduling a retry
[2016-08-01 12:40:53,934][DEBUG][action.admin.indices.get ] [ip-10-20-xxx-xxx] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]