Elastic Search 6.8.6 breaks abruptly!

jaraws · October 26, 2020, 8:12pm

Hi,

I have been facing a strange error where the Elastic search instance breaks abruptly . ELK stack works fine initially, I am able to push data against index as well but the stack later breaks with following error, the logs of Elastic server and kibana are as follows. Looking for some guidance here.

Elastic Search logs:

[2020-10-26T15:56:16,478][WARN ][o.e.g.G.InternalPrimaryShardAllocator] [xx-xxxx-xxxx.nam.nsroot.net] [logstash-2020.10.26][1]: failed to list shard for shard_started on node [rbumnqK6SsCzxoKABaorZA]
org.elasticsearch.action.FailedNodeException: Failed node [rbumnqK6SsCzxoKABaorZA]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:236) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:151) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:210) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1114) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1226)
...................
Caused by: org.elasticsearch.transport.RemoteTransportException: [xx-xxxx-xxxx.nam.nsroot.net][10.332.22.123:9300][internal:gateway/local/started_shards[n]]
Caused by: org.elasticsearch.ElasticsearchException: failed to load started shards
at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperation(TransportNodesListGatewayStartedShards.java:169) ~[elasticsearch-6.8.6.jar:6.8.6]
at org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.nodeOperat
.................
... 22 more
Caused by: org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2020-10-26T19:33:39Z, (lock=NativeFSLock(path=/data/elasticsearch/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2020-10-26T15:33:16Z))
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:191) ~[lucene-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:30:25]

Kibana logs:

{"type":"log","@timestamp":"2020-10-26T19:59:44Z","tags":["error","task_manager"],"pid":32410,"message":"Failed to poll for work: [security_exception] failed to authenticate user [kibana], with { header={ WWW-Authenticate={ 0="Bearer realm=\"security\"" & 1="ApiKey" & 2="Basic realm=\"security\" charset=\"UTF-8\"" } } } :: {"path":"/.kibana_task_manager/_doc/_search","query":{"ignore_unavailable":true},"body":"{\"query\":{\"bool\":{\"must\":[{\"term\":{\"type\":\"task\"}},{\"bool\":{\"must\":[{\"terms\":{\"task.taskType\":[\"vis_telemetry\"]}},{\"range\":{\"task.attempts\":{\"lte\":3}}},{\"range\":{\"task.runAt\":{\"lte\":\"now\"}}},{\"range\":{\"kibana.apiVersion\":{\"lte\":1}}}]}}]}},\"size\":10,\"sort\":{\"task.runAt\":{\"order\":\"asc\"}},\"seq_no_primary_term\":true}","statusCode":401,"response":"{\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"failed to authenticate user [kibana]\",\"header\":{\"WWW-Authenticate\":[\"Bearer realm=\\\"security\\\"\",\"ApiKey\",\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\"]}}],\"type\":\"security_exception\",\"reason\":\"failed to authenticate user [kibana]\",\"header\":{\"WWW-Authenticate\":[\"Bearer realm=\\\"security\\\"\",\"ApiKey\",\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\"]}},\"status\":401}","wwwAuthenticateDirective":"Bearer realm=\"security\", ApiKey, Basic realm=\"security\" charset=\"UTF-8\""}"}

I have tried reinstalling the ELK several times but the setup broke each time abruptly with same error.

Thanks in advance !

DavidTurner · October 26, 2020, 8:31pm

Something (not Elasticsearch) is modifying the last-modified time of this file. Elasticsearch treats this as an indication that it does not have exclusive control over its data path, which can lead to data corruption, and therefore it stops working to protect your data.

The fix is to track down whatever else is altering things in the data path and prevent it from doing so. It's vitally important that Elasticsearch alone is permitted to alter the contents of its data path.

jaraws · October 28, 2020, 5:45pm

@DavidTurner, Thanks for the input. I was able to resolve my problem using your hint.
The problem with my setup was that I has got two nodes in my cluster having /data directory mounted on both nodes. As a result when I was going to install ES server on one server, it was breaking or corrupting the file of ES running on another server.

Thanks

DavidTurner · October 29, 2020, 8:12am

Thanks, that roughly make sense, but does mean that (a) you're using some kind of network-based shared storage and (b) this storage does not implement file locking correctly. Local storage is generally recommended over shared storage: it performs better and tends not to have this kind of correctness issue. Bug-free file locking isn't terribly important to Elasticsearch (except to protect against this kind of setup issue), but would make me worry that it doesn't implement other more important filesystem features correctly too. Tread carefully.

system · November 26, 2020, 8:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2017-05-15T07:17:37Z Elasticsearch	5	5658	May 10, 2018
Underlying file changed by an external force Elasticsearch	5	7039	July 25, 2018
Node.lock file - Underlying file changed by an external force Elasticsearch	2	2367	December 1, 2017
Elasticserach cluster log Elasticsearch	1	1456	June 12, 2019
ES cluster in Docker containers : AlreadyClosedException Underlying file changed by an external force Elasticsearch	9	6959	July 5, 2017

Elastic Search 6.8.6 breaks abruptly!

Elastic Search logs:

Kibana logs:

Related topics