Elasticsearch snapshots on Azure Blob stops suddenly with weird error

I have multiple environments on Azure. Each of the env. has multiple node elasticsearch cluster.

everything was working fine for months and now suddenly ES snapshot stopped worked with weird error. Error during snapshot is below

{
  "error" : {
    "root_cause" : [
      {
        "type" : "no_such_element_exception",
        "reason" : "An error occurred while enumerating the result, check the original exception for details."
      }
    ],
    "type" : "no_such_element_exception",
    "reason" : "An error occurred while enumerating the result, check the original exception for details.",
    "caused_by" : {
      "type" : "storage_exception",
      "reason" : "The specified account does not exist.",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  },
  "status" : 500
}

I verified all permissions and key are same. There is no change in any Blob permission or so. I tried even restarting ES master node but no luck.

If I run blob upload command from same server to same Blob, it works fine - this proves there is no permission issue.

I enabled debugging of Azure ES snapshot plugin to see actual error but error is not much meaningful

ES logs says

[2019-09-29T05:07:09,971][WARN ][r.suppressed ] [afqLC7_] path: /_snapshot/esbackuprepository/data01, params: {pretty=, repository=esbackuprepository, snapshot=data01} java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) ~[?:?] at org.elasticsearch.cloud.azure.storage.AzureStorageService.lambda$listBlobsByPrefix$14(AzureStorageService.java:227) ~[?:?] at org.elasticsearch.cloud.azure.blobstore.util.SocketAccess.lambda$doPrivilegedVoidException$0(SocketAccess.java:64) ~[?:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_201] at org.elasticsearch.cloud.azure.blobstore.util.SocketAccess.doPrivilegedVoidException(SocketAccess.java:63) ~[?:?] at org.elasticsearch.cloud.azure.storage.AzureStorageService.listBlobsByPrefix(AzureStorageService.java:226) ~[?:?] at org.elasticsearch.cloud.azure.blobstore.AzureBlobStore.listBlobsByPrefix(AzureBlobStore.java:119) ~[?:?] at org.elasticsearch.cloud.azure.blobstore.AzureBlobContainer.listBlobsByPrefix(AzureBlobContainer.java:120) ~[?:?] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.listBlobsToGetLatestIndexId(BlobStoreRepository.java:822) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.latestIndexBlobId(BlobStoreRepository.java:800) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getRepositoryData(BlobStoreRepository.java:663) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.snapshots.SnapshotsService.createSnapshot(SnapshotsService.java:235) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.admin.cluster.snapshots.create.TransportCreateSnapshotAction.masterOperation(TransportCreateSnapshotAction.java:83) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.admin.cluster.snapshots.create.TransportCreateSnapshotAction.masterOperation(TransportCreateSnapshotAction.java:41) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:108) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:195) ~[elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [elasticsearch-6.5.4.jar:6.5.4] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.5.4.jar:6.5.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201] Caused by: com.microsoft.azure.storage.StorageException: The specified account does not exist. at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87) ~[?:?] at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:209) ~[?:?] at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) ~[?:?] ... 20 more Caused by: java.lang.NullPointerException at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:189) ~[?:?] at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) ~[?:?] ... 20 more

Even changing backup container is failing with message

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[esbackuprepository] path [engg-az-dev2] is not accessible on master node"
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[esbackuprepository] path [engg-az-dev2] is not accessible on master node",
    "caused_by": {
      "type": "i_o_exception",
      "reason": "Can not write blob master.dat",
      "caused_by": {
        "type": "storage_exception",
        "reason": "The specified account does not exist.",
        "caused_by": {
          "type": "null_pointer_exception",
          "reason": null
        }
      }
    }
  },
  "status": 500
}

Any thoughts?

@sunilthemaster it appears the problem is with your Azure storage account not being found. Can you try creating a new storage account and test with that?

@Armin_Braun - Credentials of the storage account did not change. On restarting ES (no change in yml file), it connects back.

Also from same VM where snapshot is failing, am able to connect to Blob (using AZ CLI cmd)

@sunilthemaster so after a node restart the situation was fixed and the ES node can connect to Azure now?

Yes.

@sunilthemaster ah good to hear. I think what happened here then was likely that the Azure credentials from the keystore weren't yet reloaded on the nodes (you have to restart to reload them after putting them in the keystore)?

@Armin_Braun : We have provided azure Storage credentials directly in elasticsearch config file and not via keystore.

All the nodes of cluster were up & running for more than 180 days and every 4 hr backups were working.

To fix the issue, we have to just restart all ES nodes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.