Hey guys,
I'm running Elastic 7.2.0 on a single machine in docker containers. It works fine except for some problems with the shards.
I let Elastic run for a few days and looked at the logs today. The look fine, except for one thing: Elasticsearch threw a lot of "All shards failed"-Exceptions at some points. Please find the stacktrace below:
{"type": "server", "timestamp": "2019-07-30T15:03:57,209+0200", "level": "DEBUG", "component": "o.e.a.s.TransportSearchAction", "cluster.name": "A-Elastic-Stack", "node.name": "es01", "cluster.uuid": "qxxxxxxxxxxxxxx", "node.id": "0xxxxxxxxxxx",  "message": "All shards failed for phase: [query]"  }
{"type": "server", "timestamp": "2019-07-30T15:03:57,213+0200", "level": "WARN", "component": "r.suppressed", "cluster.name": "A-Elastic-Stack", "node.name": "es01", "cluster.uuid": "qxxxxxxxxxxxx", "node.id": "0xxxxxxxxxxxxx",  "message": "path: /.kibana_task_manager/_search, params: {ignore_unavailable=true, index=.kibana_task_manager}" ,
"stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:296) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:139) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:105) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$1(InitialSearchPhase.java:251) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:172) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:835) [?:?]"] }
The results of  GET /_cluster/health/?level=shards look fine to me. Please find a snippet of the result below:
  "cluster_name" : "A-Elastic-Stack",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 37,
  "active_shards" : 37,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0,
  "indices" : {
    ".monitoring-logstash-7-2019.07.28" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 0,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    },
...
    ".kibana_task_manager" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 0,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    },
All of the shards are green.
I also noticed that Elasticsearch throws similar errors during start up occasionally.  However, during the startup it differs between "are shards failed" ".security-7 failed" and no exceptions whatsoever.
Can somebody give me a hint what to look at?
Could it be relationed to the setting auto_expand_replicas, which is set to "0-1"? Is ES trying to create a replica but failing to do so since there are no other instances of ES up and running?
Thanks in advance!