Hi,
I have been facing a strange error where the Elastic search instance breaks abruptly . ELK stack works fine initially, I am able to push data against index as well but the stack later breaks with following error, the logs of Elastic server and kibana are as follows. Looking for some guidance here.
Elastic Search logs:
[2020-10-26T15:56:16,478][WARN ][o.e.g.G.InternalPrimaryShardAllocator] [xx-xxxx-xxxx.nam.nsroot.net] [logstash-2020.10.26][1]: failed to list shard for shard_started on node [rbumnqK6SsCzxoKABaorZA]
org.elasticsearch.action.FailedNodeException: Failed node [rbumnqK6SsCzxoKABaorZA]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:236) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:151) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:210) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1114) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1226)
...................
Caused by: org.elasticsearch.transport.RemoteTransportException: [xx-xxxx-xxxx.nam.nsroot.net][10.332.22.123:9300][internal:gateway/local/started_shards[n]]
Caused by: org.elasticsearch.ElasticsearchException: failed to load started shards
at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:169) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperat
.................
... 22 more
Caused by: org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2020-10-26T19:33:39Z, (lock=NativeFSLock(path=/data/elasticsearch/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2020-10-26T15:33:16Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:191) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]
Kibana logs:
{"type":"log","@timestamp":"2020-10-26T19:59:44Z","tags":["error","task_manager"],"pid":32410,"message":"Failed to poll for work: [security_exception] failed to authenticate user [kibana], with { header={ WWW-Authenticate={ 0="Bearer realm=\"security\"" & 1="ApiKey" & 2="Basic realm=\"security\" charset=\"UTF-8\"" } } } :: {"path":"/.kibana_task_manager/_doc/_search","query":{"ignore_unavailable":true},"body":"{\"query\":{\"bool\":{\"must\":[{\"term\":{\"type\":\"task\"}},{\"bool\":{\"must\":[{\"terms\":{\"task.taskType\":[\"vis_telemetry\"]}},{\"range\":{\"task.attempts\":{\"lte\":3}}},{\"range\":{\"task.runAt\":{\"lte\":\"now\"}}},{\"range\":{\"kibana.apiVersion\":{\"lte\":1}}}]}}]}},\"size\":10,\"sort\":{\"task.runAt\":{\"order\":\"asc\"}},\"seq_no_primary_term\":true}","statusCode":401,"response":"{\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"failed to authenticate user [kibana]\",\"header\":{\"WWW-Authenticate\":[\"Bearer realm=\\\"security\\\"\",\"ApiKey\",\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\"]}}],\"type\":\"security_exception\",\"reason\":\"failed to authenticate user [kibana]\",\"header\":{\"WWW-Authenticate\":[\"Bearer realm=\\\"security\\\"\",\"ApiKey\",\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\"]}},\"status\":401}","wwwAuthenticateDirective":"Bearer realm=\"security\", ApiKey, Basic realm=\"security\" charset=\"UTF-8\""}"}
I have tried reinstalling the ELK several times but the setup broke each time abruptly with same error.
Thanks in advance !