Very often FileSystemException : Operation not permitted

Hello,

I've a cluster of elasticsearch 6.1.2 and very often in the elasticsearch log I've :

        [2021-04-15T01:03:38,388][WARN ][o.e.c.a.s.ShardStateAction] [ELK1] [events-202104][2] received shard failed for shard id [[events-202104][2]], allocation id [nDN55tAlR0q-1F_g2ONYYw], primary term [0], message [shard failure, reason [lucene commit failed]], failure [FileSystemException[/storage/nodes/0/indices/0iikkDzuSQirSdM6MmUqKQ/2/i
    ndex/_7li.cfe: Operation not permitted]]
    java.nio.file.FileSystemException: /storage/nodes/0/indices/0iikkDzuSQirSdM6MmUqKQ/2/index/_7li.cfe: Operation not permitted
            at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) ~[?:?]
            at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[?:?]
            at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[?:?]
            at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) ~[?:?]
            at java.nio.channels.FileChannel.open(FileChannel.java:287) ~[?:1.8.0_191]
            at java.nio.channels.FileChannel.open(FileChannel.java:335) ~[?:1.8.0_191]
            at org.apache.lucene.util.IOUtils.fsync(IOUtils.java:471) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:327) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:285) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.store.FilterDirectory.sync(FilterDirectory.java:83) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4757) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3281) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3413) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3378) ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b8a38550659 - ubuntu - 2017-10-13 16:12:42]
            at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2086) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.index.engine.InternalEngine.commitIndexWriter(InternalEngine.java:2079) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.index.engine.InternalEngine.syncFlush(InternalEngine.java:1396) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.index.shard.IndexShard.syncFlush(IndexShard.java:997) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.indices.flush.SyncedFlushService.performSyncedFlush(SyncedFlushService.java:423) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.indices.flush.SyncedFlushService.access$1100(SyncedFlushService.java:70) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.indices.flush.SyncedFlushService$SyncedFlushTransportHandler.messageReceived(SyncedFlushService.java:704) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.indices.flush.SyncedFlushService$SyncedFlushTransportHandler.messageReceived(SyncedFlushService.java:700) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:30) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1554) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637) ~[ngStorage-6.1.2.jar:6.1.2]
            at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[ngStorage-6.1.2.jar:6.1.2]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_191]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_191]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

I've verified disk space, filesystem permissions, etc... and I've no more idea on what happen. Any clue ?

Thanks in advance for you help.

What type of storage are you using?

I'm using a simple ext4 partition.

Is that backed by local SSD? Local hard drive? Some kind of networked storage?

It's a virtual disk backed by local ssd on the esx.

Sorry, after verification the filesystem is not ext4 but xfs

FYI 6.1 has been EOL since 2019-06-13 . Please upgrade ASAP.

It sounds to me like a potential issue with the storage as that should not happen. How is the virtual disk created/setup? How often does the error appear? Is there any pattern to when it is happening? Does it happen to all nodes in the cluster?

It always happen during bulk index operations.

The error is random on all 14 nodes.

I didn't see any specific pattern for that.

How is the virtual disk set up/created? What technologies are you using?

My customer is using VMWare VSAN backed by SSD.

Virtual disks are vmdk with thick provisioning.

The phrase Operation not permitted is an error reported by the OS when it prevents Elasticsearch from opening the named file - specifically this is the Unix error code EPERM. This isn't an Elasticsearch-specific error, there's something wrong with the environment if Elasticsearch can't open files like this.

1 Like

Found the problem. The customer has installed kaspersky antivirus on all the nodes.

Thanks for your help.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.