A file written by master to the store cannot be accessed on the node

I am using Elasticsearch 7.8.0.
When I want to make backups, most of the time everything is working fine, but from time to time I am getting the following error and backup fails:

[2021-06-02T09:50:25,297][WARN ][r.suppressed             ] [node-172.24.90.194] path: /_snapshot/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967, params: {master_timeout=30s, repository=MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967, timeout=30s}
org.elasticsearch.repositories.RepositoryVerificationException: [MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] [[iXj7aOiXTQSagzYnT5p5Bg, 'RemoteTransportException[[node-172.24.90.199][172.24.90.199:9311][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] a file written by master to the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] cannot be accessed on the node [{node-172.24.90.199}{iXj7aOiXTQSagzYnT5p5Bg}{vi-MQaGuTgea93Z-4AOlWQ}{172.24.90.199}{172.24.90.199:9311}{dimrt}{xpack.installed=true, transform.node=true}]. This might indicate that the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]; nested: NoSuchFileException[/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967/tests-yRRk-kZXQ3anBMDXI1NvPQ/master.dat];']]
	at org.elasticsearch.repositories.VerifyNodeRepositoryAction.finishVerification(VerifyNodeRepositoryAction.java:118) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.repositories.VerifyNodeRepositoryAction.access$000(VerifyNodeRepositoryAction.java:49) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.repositories.VerifyNodeRepositoryAction$1.handleException(VerifyNodeRepositoryAction.java:107) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1173) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:235) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:233) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:225) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:115) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:78) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:692) ~[elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) [elasticsearch-7.8.0.jar:7.8.0]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:73) [transport-netty4-client-7.8.0.jar:7.8.0]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
2021-06-02 09:50:25,301 ERROR {pool-13619-thread-1} [no.nera.ngnms.sysman.bsl.cmd.database.es.backup.task.ESBackupTask] Error encountered 
ElasticsearchStatusException[Elasticsearch exception [type=repository_verification_exception, reason=[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] [[iXj7aOiXTQSagzYnT5p5Bg, 'RemoteTransportException[[node-172.24.90.199][172.24.90.199:9311][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] a file written by master to the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] cannot be accessed on the node [{node-172.24.90.199}{iXj7aOiXTQSagzYnT5p5Bg}{vi-MQaGuTgea93Z-4AOlWQ}{172.24.90.199}{172.24.90.199:9311}{dimrt}{xpack.installed=true, transform.node=true}]. This might indicate that the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]; nested: NoSuchFileException[/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967/tests-yRRk-kZXQ3anBMDXI1NvPQ/master.dat];']]]]
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1897)
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1867)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1624)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1581)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1551)
	at org.elasticsearch.client.SnapshotClient.createRepository(SnapshotClient.java:101)
	at com.telecom.nms.server.common.elasticsearch.impl.ESRepositoryService.createRepository(ESRepositoryService.java:145)
	at com.telecom.nms.server.common.elasticsearch.impl.ESAdminClientImpl.createRepository(ESAdminClientImpl.java:479)
	at no.nera.ngnms.sysman.es.client.ESClientGatewayImpl.createRepository(ESClientGatewayImpl.java:291)
	at no.nera.ngnms.sysman.es.ClusterClientFacadeImpl.reAddRepository(ClusterClientFacadeImpl.java:257)
	at no.nera.ngnms.sysman.bsl.cmd.database.es.backup.task.ESBackupTask.call(ESBackupTask.java:49)
	at no.nera.ngnms.sysman.bsl.cmd.database.es.backup.task.ESBackupTask.call(ESBackupTask.java:1)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [https://172.24.90.194:9211], URI [/_snapshot/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967?master_timeout=30s&timeout=30s], status line [HTTP/1.1 500 Internal Server Error]
{"error":{"root_cause":[{"type":"repository_verification_exception","reason":"[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] [[iXj7aOiXTQSagzYnT5p5Bg, 'RemoteTransportException[[node-172.24.90.199][172.24.90.199:9311][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] a file written by master to the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] cannot be accessed on the node [{node-172.24.90.199}{iXj7aOiXTQSagzYnT5p5Bg}{vi-MQaGuTgea93Z-4AOlWQ}{172.24.90.199}{172.24.90.199:9311}{dimrt}{xpack.installed=true, transform.node=true}]. This might indicate that the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]; nested: NoSuchFileException[/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967/tests-yRRk-kZXQ3anBMDXI1NvPQ/master.dat];']]"}],"type":"repository_verification_exception","reason":"[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] [[iXj7aOiXTQSagzYnT5p5Bg, 'RemoteTransportException[[node-172.24.90.199][172.24.90.199:9311][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] a file written by master to the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] cannot be accessed on the node [{node-172.24.90.199}{iXj7aOiXTQSagzYnT5p5Bg}{vi-MQaGuTgea93Z-4AOlWQ}{172.24.90.199}{172.24.90.199:9311}{dimrt}{xpack.installed=true, transform.node=true}]. This might indicate that the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]; nested: NoSuchFileException[/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967/tests-yRRk-kZXQ3anBMDXI1NvPQ/master.dat];']]"},"status":500}
		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:261)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
		at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1611)
		... 13 more
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=repository_verification_exception, reason=[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] [[iXj7aOiXTQSagzYnT5p5Bg, 'RemoteTransportException[[node-172.24.90.199][172.24.90.199:9311][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] a file written by master to the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] cannot be accessed on the node [{node-172.24.90.199}{iXj7aOiXTQSagzYnT5p5Bg}{vi-MQaGuTgea93Z-4AOlWQ}{172.24.90.199}{172.24.90.199:9311}{dimrt}{xpack.installed=true, transform.node=true}]. This might indicate that the store [/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node]; nested: NoSuchFileException[/home/vnv_lab/nfs_client_share/MXklJnlJQvmn5snoMgbYzw_nms_db_active_1016315967/tests-yRRk-kZXQ3anBMDXI1NvPQ/master.dat];']]]

If I try one more time, the backup is performed successfully.
I need some advice, please.

This indicates that your snapshot repository doesn't have the read-after-write semantics that Elasticsearch needs to work correctly, and that puts your snapshot data at risk.

Version 7.8 has now passed EOL and newer versions have better tools to analyse your repository behaviour so I recommend that you upgrade and then use these tools to investigate your repository behaviour more thoroughly.

2 Likes

Thank you! :slight_smile: