Elasticserach6.1.1 restart and i got "all shards failed"

➜  ~ curl -XGET 'localhost:9200/_cluster/health/balance_sheet?pretty'                                     
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 10,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 4,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 378,
  "active_shards_percent_as_number" : 7.501013259220659
}

curl -XPOST 'localhost:9200/balance_sheet/doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": { "securityId" : "stock_sz_000002" }
  },
  "size": 1,
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ]
}
'
{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ ]
  },
  "status" : 503
}

And the logs show:

[2018-01-26T14:15:30,706][INFO ][o.e.n.Node               ] [uoYBml-] started
[2018-01-26T14:16:07,317][INFO ][o.e.l.LicenseService     ] [uoYBml-] license [dd71f992-0c83-46b5-a452-d5a8ae4d18d9] mode [trial] - valid
[2018-01-26T14:16:07,319][INFO ][o.e.g.GatewayService     ] [uoYBml-] recovered [3455] indices into cluster_state
[2018-01-26T14:16:07,320][WARN ][o.e.c.s.MasterService    ] [uoYBml-] cluster state update task [local-gateway-elected-state] took [30.4s] above the warn threshold of 30s
[2018-01-26T14:27:43,896][DEBUG][o.e.a.s.TransportSearchAction] [uoYBml-] All shards failed for phase: [query]
[2018-01-26T14:27:43,898][WARN ][r.suppressed             ] path: /balance_sheet/doc/_search, params: {pretty=, index=balance_sheet, type=doc}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$4(InitialSearchPhase.java:205) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:184) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
[2018-01-26T14:29:41,367][DEBUG][o.e.a.s.TransportSearchAction] [uoYBml-] All shards failed for phase: [query]
[2018-01-26T14:29:41,369][WARN ][r.suppressed             ] path: /balance_sheet/doc/_search, params: {pretty=, index=balance_sheet, type=doc}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$4(InitialSearchPhase.java:205) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:184) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

The cluster health status:

➜  ~ curl -XGET 'localhost:9200/_cluster/health?pretty'                                                   
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 3083,
  "active_shards" : 3083,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 31455,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 4,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 998,
  "active_shards_percent_as_number" : 8.92536622083261
}

My elasticsearch.yml change default to:

discovery.type: single-node

node.ml: false
xpack.ml.enabled: false
xpack.security.enabled: false
xpack.monitoring.enabled: false

I have counted this problem several times, and it seems i could just delete all the indices and reindex them again.

I confirm it shoult not related with the max open file handlers issue:

➜  ~ ulimit -S
unlimited
➜  ~ cat /proc/sys/fs/file-max
1600480

Could any one meet this problem before or could give some suggestion?thanks very much!

You have far too many shards given the size of your cluster. You need to change your sharding strategy dramatically. Have a look at the following blog post for some guidance around shard count and size:

If you can reindex your data from an external source, the easiest way will probably be to delete all data and reindex from scratch.

2 Likes

thanks for your quick and awesome response!

additional info:
the cluster recover again after about 4 hours......so it's really big shards number problem.
I would change the shards stragedy,thanks again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.