All shards failed for phase: [query] on Elasticsearch 7.2.0

Extinguo · August 1, 2019, 7:46am

Hey guys,
I'm running Elastic 7.2.0 on a single machine in docker containers. It works fine except for some problems with the shards.
I let Elastic run for a few days and looked at the logs today. The look fine, except for one thing: Elasticsearch threw a lot of "All shards failed"-Exceptions at some points. Please find the stacktrace below:

{"type": "server", "timestamp": "2019-07-30T15:03:57,209+0200", "level": "DEBUG", "component": "o.e.a.s.TransportSearchAction", "cluster.name": "A-Elastic-Stack", "node.name": "es01", "cluster.uuid": "qxxxxxxxxxxxxxx", "node.id": "0xxxxxxxxxxx",  "message": "All shards failed for phase: [query]"  }
{"type": "server", "timestamp": "2019-07-30T15:03:57,213+0200", "level": "WARN", "component": "r.suppressed", "cluster.name": "A-Elastic-Stack", "node.name": "es01", "cluster.uuid": "qxxxxxxxxxxxx", "node.id": "0xxxxxxxxxxxxx",  "message": "path: /.kibana_task_manager/_search, params: {ignore_unavailable=true, index=.kibana_task_manager}" ,
"stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:296) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:139) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:105) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$1(InitialSearchPhase.java:251) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:172) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:835) [?:?]"] }

The results of GET /_cluster/health/?level=shards look fine to me. Please find a snippet of the result below:

  "cluster_name" : "A-Elastic-Stack",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 37,
  "active_shards" : 37,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0,
  "indices" : {
    ".monitoring-logstash-7-2019.07.28" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 0,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    },
...
    ".kibana_task_manager" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 0,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    },

All of the shards are green.
I also noticed that Elasticsearch throws similar errors during start up occasionally. However, during the startup it differs between "are shards failed" ".security-7 failed" and no exceptions whatsoever.

Can somebody give me a hint what to look at?
Could it be relationed to the setting auto_expand_replicas, which is set to "0-1"? Is ES trying to create a replica but failing to do so since there are no other instances of ES up and running?
Thanks in advance!

DavidTurner · August 1, 2019, 8:08am

Were there any other log messages around the same times? I might expect to see this while a node is starting up, or maybe shutting down, but not otherwise in a one-node cluster.

No, if Elasticsearch has only ever seen a single node then it won't have been trying to create a replica.

Extinguo · August 1, 2019, 9:12am

Hi David,
thanks for your answer!
Turns out I misread the datestamp. You are correct, the cluster was rebootet at the time.
Do you know why the shards fail while starting up?

DavidTurner · August 1, 2019, 10:53am

The log message is indicating that a search has failed to search any of the shards it tried. The shards themselves are still starting up at this point, which can take some time.

Arguably the logs are being overly dramatic here, there's no need for such a noisy warning in this case. I opened an issue to discuss this further.

Extinguo · August 1, 2019, 11:23am

Ok great! Thank you for the clarification.

system · August 29, 2019, 11:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fetching listing failed Kibana	3	640	August 31, 2021
Issue with elasticsearch shards: [search_phase_execution_exception] all shards failed Elasticsearch	1	442	March 20, 2024
Es docker 10.2 version is upgraded to 12.0, log "SearchPhaseExecutionException: all shards failed" Elasticsearch	3	616	April 24, 2021
All shards failed Elasticsearch	2	628	March 19, 2021
"org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed - ES 7.6 Elasticsearch	4	1074	August 24, 2020

All shards failed for phase: [query] on Elasticsearch 7.2.0

Related topics