Hi,
We've had a single node running logstash, elasticsearch and kibana for about six months. Although it wasn't intended to be a production system, it's become essential for troubleshooting - we use it to capture firewall logs via syslog.
It was working fine, but at some point in the last few weeks Kibana has begun to just give time outs (or sometimes 500 error).
In /var/log/elasticsearch//elasticsearch.log I'm seeing stuff like this:
[2019-09-20T10:00:42,499][DEBUG][o.e.a.s.TransportSearchAction] [h_iLag0] All shards failed for phase: [query]
[2019-09-20T10:00:42,500][WARN ][r.suppressed ] [h_iLag0] path: /.kibana_task_manager/_doc/_search, params: {ignore_unavailable=true, index=.kibana_task_manager, type=_doc}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:296) ~[elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:133) ~[elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) ~[elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:100) ~[elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.action.search.InitialSearchPhase.lambda$performPhaseOnShard$1(InitialSearchPhase.java:208) ~[elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:187) [elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.8.2.jar:6.8.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.2.jar:6.8.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_222]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_222]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
[2019-09-20T10:00:44,393][WARN ][o.e.m.j.JvmGcMonitorService] [h_iLag0] [gc][58] overhead, spent [1.5s] collecting in the last [1.6s]
I've no idea where to start trying to understand what this is about - can anyone supply some pointers? I've tried simple things like checking for disk space, system load and rebooting.
We're running 6.8.2 on Debian.