Dealing with node failures

My cluster config :
ES version : 6.2.3
number of nodes : 3 (master eligible/data)
number of replica : 1

I ran tests sending concurrent search requests to ES cluster using java rest client api(3 hosts, with sniffer), then killed a node.
My expectation was that,
since there are replicas on the other two nodes, the requests would succeed.
But it failed partially and the response status was 200.


"took": 558,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 5,
"skipped": 0,
"failed": 1,
"failures": [
"shard": 5,
"index": "v1",
"reason": {
"type": "node_disconnected_exception",
"reason": "~~[indices:data/read/search[phase/fetch/id]] disconnected"

So my questions are,

  1. Is there any way to get success response(without shard failure)?

  2. If I have to deal with the retry on that cases, how long it would be take to get success response?
    When I retried right after the shard failures, all requests ended with success. I wonder If I can get success response always with single retry on the node_disconnected_exception failure.


Releated topics :
Should Elasticsearch return a non-200 response if there are shard failures? #18978

How do people typically handle shard failures in their results?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.