ES is deployed on an Azure VMSS (Windows VMs). It's throwing java.io.IOException "The device is not ready" on some VMs when creating shards, while working well on some other VMs at the same time.
Here is what the exception looks like:
TraceLevel="WARN" ComponentName="default" Message="[2023-11-27T21:24:34,775][WARN ][org.elasticsearch.env.NodeEnvironment] lock assertion failed
java.io.IOException: The device is not ready
at sun.nio.ch.FileDispatcherImpl.size0(Native Method)
at sun.nio.ch.FileDispatcherImpl.size(FileDispatcherImpl.java:101)
at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:310)
at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:170)
at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:941)
at org.elasticsearch.env.NodeEnvironment.nodePaths(NodeEnvironment.java:766)
at org.elasticsearch.monitor.fs.FsProbe.stats(FsProbe.java:55)
at org.elasticsearch.monitor.fs.FsService.stats(FsService.java:60)
at org.elasticsearch.monitor.fs.FsService.access$200(FsService.java:33)
at org.elasticsearch.monitor.fs.FsService$FsInfoCache.refresh(FsService.java:78)
at org.elasticsearch.monitor.fs.FsService$FsInfoCache.refresh(FsService.java:67)
at org.elasticsearch.common.util.SingleObjectCache.getOrRefresh(SingleObjectCache.java:54)
at org.elasticsearch.monitor.fs.FsService.stats(FsService.java:55)
at org.elasticsearch.node.NodeService.stats(NodeService.java:110)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:77)
at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:42)
at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140)
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262)
at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1556)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:674)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
All the VMs have identical configurations. When I RDP to the VM, I can see "node.lock" file in "K:\esdata\nodes\0", where it's supposed to be. The device seems to be ready to me - I can create other files in above folder and can even move "node.lock" file to another folder and back.
(K: is a data disk associated with that VM exclusively, not a shared location.)
Restarting VM can solve the problem in most of the cases, but it will occur again after some time.
What could be the root cause and how should I fix it?