Elasticserach6.1.1 restart and i got "all shards failed"


(Foolcage) #1
➜  ~ curl -XGET 'localhost:9200/_cluster/health/balance_sheet?pretty'                                     
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 10,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 4,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 378,
  "active_shards_percent_as_number" : 7.501013259220659
}

curl -XPOST 'localhost:9200/balance_sheet/doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": { "securityId" : "stock_sz_000002" }
  },
  "size": 1,
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ]
}
'
{
  "error" : {
    "root_cause" : [ ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [ ]
  },
  "status" : 503
}

And the logs show:

[2018-01-26T14:15:30,706][INFO ][o.e.n.Node               ] [uoYBml-] started
[2018-01-26T14:16:07,317][INFO ][o.e.l.LicenseService     ] [uoYBml-] license [dd71f992-0c83-46b5-a452-d5a8ae4d18d9] mode [trial] - valid
[2018-01-26T14:16:07,319][INFO ][o.e.g.GatewayService     ] [uoYBml-] recovered [3455] indices into cluster_state
[2018-01-26T14:16:07,320][WARN ][o.e.c.s.MasterService    ] [uoYBml-] cluster state update task [local-gateway-elected-state] took [30.4s] above the warn threshold of 30s
[2018-01-26T14:27:43,896][DEBUG][o.e.a.s.TransportSearchAction] [uoYBml-] All shards failed for phase: [query]
[2018-01-26T14:27:43,898][WARN ][r.suppressed             ] path: /balance_sheet/doc/_search, params: {pretty=, index=balance_sheet, type=doc}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$4(InitialSearchPhase.java:205) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:184) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
[2018-01-26T14:29:41,367][DEBUG][o.e.a.s.TransportSearchAction] [uoYBml-] All shards failed for phase: [query]
[2018-01-26T14:29:41,369][WARN ][r.suppressed             ] path: /balance_sheet/doc/_search, params: {pretty=, index=balance_sheet, type=doc}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:274) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:132) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:243) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:107) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$4(InitialSearchPhase.java:205) ~[elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:184) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.1.1.jar:6.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.1.1.jar:6.1.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

The cluster health status:

➜  ~ curl -XGET 'localhost:9200/_cluster/health?pretty'                                                   
{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 3083,
  "active_shards" : 3083,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 31455,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 4,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 998,
  "active_shards_percent_as_number" : 8.92536622083261
}

My elasticsearch.yml change default to:

discovery.type: single-node

node.ml: false
xpack.ml.enabled: false
xpack.security.enabled: false
xpack.monitoring.enabled: false

I have counted this problem several times, and it seems i could just delete all the indices and reindex them again.

I confirm it shoult not related with the max open file handlers issue:

➜  ~ ulimit -S
unlimited
➜  ~ cat /proc/sys/fs/file-max
1600480

Could any one meet this problem before or could give some suggestion?thanks very much!


(Christian Dahlqvist) #2

You have far too many shards given the size of your cluster. You need to change your sharding strategy dramatically. Have a look at the following blog post for some guidance around shard count and size:

If you can reindex your data from an external source, the easiest way will probably be to delete all data and reindex from scratch.


(Foolcage) #3

thanks for your quick and awesome response!


(Foolcage) #4

additional info:
the cluster recover again after about 4 hours......so it's really big shards number problem.
I would change the shards stragedy,thanks again.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.