I have an Elasticsearch cluster with three servers (testnode00, testnode01,
testnode02), with two Elasticsearch instances running on each server (ports
9300 and 9301). Total 6 instances.
The instances have been configured with
cluster.routing.allocation.awareness.attributes=zone,tag setting so that
instances running on the same server can both die and the cluster still
works properly.
Config file in https://gist.github.com/jjheinon/7989423
This works in real life too, I can shut down both instances on the same
server and everything still works.
Everything works fine, until I actually shut down one of the servers (i.e.
testnode01)
Then the whole cluster will become unresponsive.
The basic status requests do work:
curl 'http://testnode00:9200/'
->
{
"ok" : true,
"status" : 200,
"name" : "testnode00_ebs",
"version" : {
"number" : "0.90.5",
"build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
"build_timestamp" : "2013-09-17T13:09:46Z",
"build_snapshot" : false,
"lucene_version" : "4.4"
},
"tagline" : "You Know, for Search"
}
Cluster health request also works:
curl 'http://testnode00:9200/_cluster/health'
->
{
"active_primary_shards":120,"active_shards":240,"cluster_name":
"test_cluster",
"initializing_shards":0,"number_of_data_nodes":6,"number_of_nodes":6,"
relocating_shards":2,"status":
"green",
"timed_out":false,"unassigned_shards":0}
but node status request times out:
curl 'http://testnode00:9200/_nodes/stats'
-> Timeout
Search requests won't work either anymore:
curl 'http://testnode00:9200/_search/?q=name:test'
-> Timeout
There's nothing visible on elasticsearch log if shutting down the server.
Iif I manually shut down both Elasticsearch instances on the server, then I
will get the node disconnect messages on the log and everything fails over
properly and all the above requests work.
[2013-12-16 15:57:59,945][DEBUG][action.admin.cluster.node.stats]
[testnode00_ebs] failed to execute on node [Th4-MYtTTdGh3wZFh3W4vA]
org.elasticsearch.transport.NodeDisconnectedException:
[testnode01_ebs][inet[/10.43.129.161:9300]][cluster/nodes/stats/n]
disconnected
Any ideas why the unicast discovery won't detect missing servers?
discovery.zen.ping.timeout does not seem to help. And why _nodes/stats
request doesn't work if one of the nodes is unresponsive?
Is there a way to tune TTL values for requests between Elasticsearch nodes?
Additional question:
Is there a way to tell cloud-aws ec2 discovery plugin to find two instances
on a single server or does it detect only the first one (on port 9300 and
not the one on 9301)?
Regards,
// Janne
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/72c88b17-8c26-4d0c-b8e6-3ef034614c96%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.