Hey guys,
I'm running Elastic 7.2.0 on a single machine in docker containers. It works fine except for some problems with the shards.
I let Elastic run for a few days and looked at the logs today. The look fine, except for one thing: Elasticsearch threw a lot of "All shards failed"-Exceptions at some points. Please find the stacktrace below:
{"type": "server", "timestamp": "2019-07-30T15:03:57,209+0200", "level": "DEBUG", "component": "o.e.a.s.TransportSearchAction", "cluster.name": "A-Elastic-Stack", "node.name": "es01", "cluster.uuid": "qxxxxxxxxxxxxxx", "node.id": "0xxxxxxxxxxx", "message": "All shards failed for phase: [query]" }
{"type": "server", "timestamp": "2019-07-30T15:03:57,213+0200", "level": "WARN", "component": "r.suppressed", "cluster.name": "A-Elastic-Stack", "node.name": "es01", "cluster.uuid": "qxxxxxxxxxxxx", "node.id": "0xxxxxxxxxxxxx", "message": "path: /.kibana_task_manager/_search, params: {ignore_unavailable=true, index=.kibana_task_manager}" ,
"stacktrace": ["org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:296) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:139) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:105) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$1(InitialSearchPhase.java:251) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:172) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:44) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:758) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-7.2.0.jar:7.2.0]",
"at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]",
"at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]",
"at java.lang.Thread.run(Thread.java:835) [?:?]"] }
The results of GET /_cluster/health/?level=shards
look fine to me. Please find a snippet of the result below:
"cluster_name" : "A-Elastic-Stack",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 37,
"active_shards" : 37,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0,
"indices" : {
".monitoring-logstash-7-2019.07.28" : {
"status" : "green",
"number_of_shards" : 1,
"number_of_replicas" : 0,
"active_primary_shards" : 1,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"shards" : {
"0" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
}
},
...
".kibana_task_manager" : {
"status" : "green",
"number_of_shards" : 1,
"number_of_replicas" : 0,
"active_primary_shards" : 1,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"shards" : {
"0" : {
"status" : "green",
"primary_active" : true,
"active_shards" : 1,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
}
},
All of the shards are green.
I also noticed that Elasticsearch throws similar errors during start up occasionally. However, during the startup it differs between "are shards failed" ".security-7 failed" and no exceptions whatsoever.
Can somebody give me a hint what to look at?
Could it be relationed to the setting auto_expand_replicas, which is set to "0-1"? Is ES trying to create a replica but failing to do so since there are no other instances of ES up and running?
Thanks in advance!