Underlying file changed by an external force

sandeepkanabar · December 14, 2020, 9:26pm

Hi Team,

Of late, I've started seeing these errors in ES 6.8.6 cluster. I have this cluster since March but these errors started appearing only recently:

[2020-12-14T00:00:04,122][WARN ][o.e.c.r.a.AllocationService] [node1.foo.bar.com] 
failing shard [failed shard, shard [.kibana_7][0], node[9b3APiVrTliXlxUA4RR3Rg], [R],
s[STARTED], a[id=EwFxAm38RkiMwEgbV7bGaA], message [failed to perform
indices:data/write/bulk[s] on replica [.kibana_7][0], node[9b3APiVrTliXlxUA4RR3Rg], [R],
s[STARTED], a[id=EwFxAm38RkiMwEgbV7bGaA]], failure 
[RemoteTransportException[[node1.foo.bar.com][1.1.1.1:9300][indices:data/write/bulk[s][r]]]; 
nested: AlreadyClosedException[Underlying file changed by an external force at 2020-12-
10T05:58:11Z, 
   (lock=NativeFSLock(path=/data/disk1/data/nodes/0/indices/46Z0vkIURCCtFLIKx3aHow/0/index/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 
exclusive valid],creationTime=2020-12-10T05:58:11.385213Z))]; ], markAsStale [true]]
org.elasticsearch.transport.RemoteTransportException: [node1.foo.bar.com][1.1.1.1:9300][indices:data/write/bulk[s][r]]

This happens once a week or more often as well. I initially suspected the Qualys scan agent and had it disabled but still the error appears. The cluster goes into yellow state but automatically recovers on its own without me having to restart the ES Cluster. Sometimes, though I do have to restart the cluster or else just close and re-open the affected index.

Can anyone shed light on what could be wrong? Is there a way to know which process is modifying the files? I suspect it could be Anti-virus since the UUID names as folder might be suspicious to it. But how can I know which process modifies it?

[root@node1.foo.bar.com]# ll -lrt
total 212
-rw-r--r--. 1 elasticsearch elasticsearch     0 Dec 14 00:00 write.lock

ES Cluster: 6.8.6. Self managed.
Data Node: 55 GB RAM. 8 TB SSDs. 16 cores.

Total 10 data nodes and 3 master nodes.

DavidTurner · December 14, 2020, 10:53pm

It's definitely something other than Elasticsearch meddling with Elasticsearch's data. it could well be something like an antivirus program. Pinning down the specific process that's causing your problems is tricky, however, particularly if you've tried disabling the suspects without success. You could try running lsof in a loop in the hope of catching another process looking at Elasticsearch's files?

sandeepkanabar · December 17, 2020, 12:13pm

Thanks David. I set-up the following audit rule:

auditctl -a always,exit -F dir=/data/disk1/data/nodes/0/indices -F perm=wa -F uid!=elasticsearch -k mykey

This will monitor any files changes (wa - writes and attribute changes) in the directory /data/disk1/data/nodes/0/indices and its sub-dirs. And it will exclude the changes made by elasticsearch user.

And then set-up a cron job that checks if /sbin/ausearch -i --input-logs -k mykey has any output and if so, triggers an email alert.

Let me know if you have any other suggestions.

system · January 14, 2021, 12:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Underlying file changed by an external force Elasticsearch	5	6964	July 25, 2018
ES cluster in Docker containers : AlreadyClosedException Underlying file changed by an external force Elasticsearch	9	6892	July 5, 2017
Underlying file changed by an external force Elasticsearch	4	725	January 17, 2023
Node.lock file - Underlying file changed by an external force Elasticsearch	2	2323	December 1, 2017
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force Elasticsearch	4	1927	February 13, 2020

Underlying file changed by an external force

Related topics