Dear all, I hope you can help me to understand the issue we are having performing a snapshot.
We have a cluster (Elasticsearch version 7.10.1) of 4 nodes (1 ingest and 3 data/master) running as services on Windows server machines;
61 indices with 5 shards and 1 replica for each of them.
We use the Azure repository to store our snapshots that we perform every hour and keeping the last 5 only.
This process has been working properly for a long period until now where we are getting the following error:
"data_streams": [],
"include_global_state": true,
"state": "PARTIAL",
"start_time": "2021-09-20T08:23:00.980Z",
"start_time_in_millis": 1632126180980,
"end_time": "2021-09-21T01:31:14.882Z",
"end_time_in_millis": 1632187874882,
"duration_in_millis": 61693902,
"failures": [
{
"index": "itemreaddetailactivities_all",
"index_uuid": "itemreaddetailactivities_all",
"shard_id": 4,
"reason": "IndexShardSnapshotFailedException[Failed to write shard level snapshot metadata for
[prod_bak_202109200823010923/tg-_OeaqQBWIJQo58rVeZA] to [index-ddeEpGCtSAqGXhhU4ifX-A]];
nested: IOException[Can not write blob index-ddeEpGCtSAqGXhhU4ifX-A]; nested: StorageException[];
nested: UnknownHostException[xyz.blob.core.windows.net]",
"node_id": "tBawpgWfSo-IvqwOASjjcQ",
"status": "INTERNAL_SERVER_ERROR"
}
In the above code I put only one failure record, but there is one for each index with the same reason but different shards id.
"shards": {
"total": 305,
"failed": 111,
"successful": 194
}
In the log file this is the exception details:
[2021-09-21T00:08:10,104][WARN ][o.e.s.SnapshotShardsService] [data_node_01] [[items_2016][3]][my_backup_azure_production:prod_bak_202109200823010923/tg-_OeaqQBWIJQo58rVeZA] failed to snapshot shard
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to write shard level snapshot metadata for [prod_bak_202109200823010923/tg-_OeaqQBWIJQo58rVeZA] to [index-fIce999wRSSpG_Rp6_ccdQ]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:2009) [elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:344) [elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:260) [elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:678) [elasticsearch-7.10.1.jar:7.10.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.IOException: Can not write blob index-fIce999wRSSpG_Rp6_ccdQ
at org.elasticsearch.repositories.azure.AzureBlobContainer.writeBlob(AzureBlobContainer.java:117) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.write(ChecksumBlobStoreFormat.java:146) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:2005) ~[elasticsearch-7.10.1.jar:7.10.1]
... 6 more
Caused by: com.microsoft.azure.storage.StorageException:
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) ~[?:?]
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:220) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadFullBlob(CloudBlockBlob.java:1035) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:864) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:743) ~[?:?]
at org.elasticsearch.repositories.azure.AzureBlobStore.lambda$writeBlob$18(AzureBlobStore.java:339) ~[?:?]
at org.elasticsearch.repositories.azure.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:69) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:554) ~[?:?]
at org.elasticsearch.repositories.azure.SocketAccess.doPrivilegedVoidException(SocketAccess.java:68) ~[?:?]
at org.elasticsearch.repositories.azure.AzureBlobStore.writeBlob(AzureBlobStore.java:338) ~[?:?]
at org.elasticsearch.repositories.azure.AzureBlobContainer.writeBlob(AzureBlobContainer.java:115) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.write(ChecksumBlobStoreFormat.java:146) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:2005) ~[elasticsearch-7.10.1.jar:7.10.1]
... 6 more
Caused by: java.net.UnknownHostException: transferfileforelastic.blob.core.windows.net
at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567) ~[?:?]
at java.net.Socket.connect(Socket.java:648) ~[?:?]
at sun.net.NetworkClient.doConnect(NetworkClient.java:177) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:474) ~[?:?]
at sun.net.www.http.HttpClient.openServer(HttpClient.java:569) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:265) ~[?:?]
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:372) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:189) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1194) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1082) ~[?:?]
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:175) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1375) ~[?:?]
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1350) ~[?:?]
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:220) ~[?:?]
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:100) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadFullBlob(CloudBlockBlob.java:1035) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:864) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:743) ~[?:?]
at org.elasticsearch.repositories.azure.AzureBlobStore.lambda$writeBlob$18(AzureBlobStore.java:339) ~[?:?]
at org.elasticsearch.repositories.azure.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:69) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:554) ~[?:?]
at org.elasticsearch.repositories.azure.SocketAccess.doPrivilegedVoidException(SocketAccess.java:68) ~[?:?]
at org.elasticsearch.repositories.azure.AzureBlobStore.writeBlob(AzureBlobStore.java:338) ~[?:?]
at org.elasticsearch.repositories.azure.AzureBlobContainer.writeBlob(AzureBlobContainer.java:115) ~[?:?]
at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.write(ChecksumBlobStoreFormat.java:146) ~[elasticsearch-7.10.1.jar:7.10.1]
at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:2005) ~[elasticsearch-7.10.1.jar:7.10.1]
... 6 more
If you need more details please ask me, thanks.