Elasticsearch snapshots on Azure Blob stops suddenly with weird error

sunilthemaster · September 29, 2019, 6:00pm

I have multiple environments on Azure. Each of the env. has multiple node elasticsearch cluster.

everything was working fine for months and now suddenly ES snapshot stopped worked with weird error. Error during snapshot is below

{
  "error" : {
    "root_cause" : [
      {
        "type" : "no_such_element_exception",
        "reason" : "An error occurred while enumerating the result, check the original exception for details."
      }
    ],
    "type" : "no_such_element_exception",
    "reason" : "An error occurred while enumerating the result, check the original exception for details.",
    "caused_by" : {
      "type" : "storage_exception",
      "reason" : "The specified account does not exist.",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  },
  "status" : 500
}

I verified all permissions and key are same. There is no change in any Blob permission or so. I tried even restarting ES master node but no luck.

If I run blob upload command from same server to same Blob, it works fine - this proves there is no permission issue.

I enabled debugging of Azure ES snapshot plugin to see actual error but error is not much meaningful

ES logs says

[2019-09-29T05:07:09,971][WARN ][r.suppressed ] [afqLC7_] path: /_snapshot/esbackuprepository/data01, params: {pretty=, repository=esbackuprepository, snapshot=data01} java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) ~[?:?] at org.elasticsearch.cloud.azure.storage.AzureStorageService.lambda$listBlobsByPrefix$14(AzureStorageService.java:227) ~[?:?] at org.elasticsearch.cloud.azure.blobstore.util.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:64) ~[?:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201] at org.elasticsearch.cloud.azure.blobstore.util.SocketAccess.doPrivilegedVoidException(SocketAccess.java:63) ~[?:?] at org.elasticsearch.cloud.azure.storage.AzureStorageService.listBlobsByPrefix(AzureStorageService.java:226) ~[?:?] at org.elasticsearch.cloud.azure.blobstore.AzureBlobStore.listBlobsByPrefix(AzureBlobStore.java:119) ~[?:?] at org.elasticsearch.cloud.azure.blobstore.AzureBlobContainer.listBlobsByPrefix(AzureBlobContainer.java:120) ~[?:?] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.listBlobsToGetLatestIndexId(BlobStoreRepository.java:822) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:800) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:663) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.snapshots.SnapshotsService.createSnapshot(SnapshotsService.java:235) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.admin.cluster.snapshots.create.TransportCreateSnapshotAction.masterOperation(TransportCreateSnapshotAction.java:83) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.admin.cluster.snapshots.create.TransportCreateSnapshotAction.masterOperation(TransportCreateSnapshotAction.java:41) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:108) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:195) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.5.4.jar:6.5.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] Caused by: com.microsoft.azure.storage.StorageException: The specified account does not exist. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) ~[?:?] at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:209) ~[?:?] at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) ~[?:?] ... 20 more Caused by: java.lang.NullPointerException at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:189) ~[?:?] at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) ~[?:?] ... 20 more

Even changing backup container is failing with message

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[esbackuprepository] path [engg-az-dev2] is not accessible on master node"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[esbackuprepository] path [engg-az-dev2] is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Can not write blob master.dat",
      "caused_by": {
        "type": "storage_exception",
        "reason": "The specified account does not exist.",
        "caused_by": {
          "type": "null_pointer_exception",
          "reason": null
        }
      }
    }
  },
  "status": 500
}

Any thoughts?

Armin_Braun · September 30, 2019, 10:58am

@sunilthemaster it appears the problem is with your Azure storage account not being found. Can you try creating a new storage account and test with that?

sunilthemaster · October 17, 2019, 5:55pm

@Armin_Braun - Credentials of the storage account did not change. On restarting ES (no change in yml file), it connects back.

Also from same VM where snapshot is failing, am able to connect to Blob (using AZ CLI cmd)

Armin_Braun · October 17, 2019, 6:07pm

@sunilthemaster so after a node restart the situation was fixed and the ES node can connect to Azure now?

sunilthemaster · October 18, 2019, 5:00am

Yes.

Armin_Braun · October 18, 2019, 7:25am

@sunilthemaster ah good to hear. I think what happened here then was likely that the Azure credentials from the keystore weren't yet reloaded on the nodes (you have to restart to reload them after putting them in the keystore)?

sunilthemaster · October 18, 2019, 5:26pm

@Armin_Braun : We have provided azure Storage credentials directly in elasticsearch config file and not via keystore.

All the nodes of cluster were up & running for more than 180 days and every 4 hr backups were working.

To fix the issue, we have to just restart all ES nodes.

system · November 15, 2019, 5:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastisearch snapshots error - Can not find an azure client for account Elasticsearch snapshot-and-restore	1	345	December 7, 2021
Azure snapshot issue - getting: blob_storage_exception","reason":"Status code 400, "\nBlobTypeNotSupportedBlock blobs are not supported Elasticsearch docker , snapshot-and-restore	2	458	May 12, 2023
Unable to create snapshot repo in azure Elasticsearch snapshot-and-restore	6	1169	November 15, 2022
Elastic search azure Elasticsearch	1	355	July 6, 2020
Experiencing an error while restoring the snapshot Elastic Search elastic-app-search	7	411	March 19, 2024

Elasticsearch snapshots on Azure Blob stops suddenly with weird error

Related topics