Restore snapshot checksum problem (Troubleshooting corruption)

Unfortunately it's not that easy (at least not in next quarter) to upgrade to 8.17 for various reasons but we'll definitely consider upgrading to more recent versions of ES.

Here is the error.

{
  "error": {
    "root_cause": [
      {
        "type": "repository_verification_exception",
        "reason": "[transaction-log] register [test-register-9UxQpoW3T4CnILP7csWraw] should have value [10] but instead had value [OptionalBytesReference[MISSING]]",
        "suppressed": [
          {
            "type": "repository_verification_exception",
            "reason": "[transaction-log] failure processing [blob analysis [transaction-log: temp-analysis-r190NeMWQh-3xSzWrTdkJg/test-blob-20-XYRw2e1nSSepBrqBojm7ZQ, length=32768, seed=7696665789984602171, readEarly=false, writeAndOverwrite=false, abortWrite=false]]"
          },
          {
            "type": "repository_verification_exception",
            "reason": "[transaction-log] failure processing [blob analysis [transaction-log: temp-analysis-r190NeMWQh-3xSzWrTdkJg/test-blob-17-WNV_jknSTkyskYpb3f3xpw, length=262144, seed=5055718760755192637, readEarly=false, writeAndOverwrite=false, abortWrite=false]]"
          },
          {
            "type": "repository_verification_exception",
            "reason": "[transaction-log] failure processing [blob analysis [transaction-log: temp-analysis-r190NeMWQh-3xSzWrTdkJg/test-blob-14-IcUPELNyQriW8GwieptCVA, length=1048576, seed=-5087821163508988974, readEarly=false, writeAndOverwrite=false, abortWrite=false]]"
          },
          {
            "type": "repository_verification_exception",
            "reason": "[transaction-log] failure processing [blob analysis [transaction-log: temp-analysis-r190NeMWQh-3xSzWrTdkJg/test-blob-18-ITETVXF1Qta7mbSt9dnPEg, length=256, seed=225571677622846146, readEarly=false, writeAndOverwrite=false, abortWrite=false]]"
          },
          {
            "type": "repository_verification_exception",
            "reason": "[transaction-log] failure processing [blob analysis [transaction-log: temp-analysis-r190NeMWQh-3xSzWrTdkJg/test-blob-12-8nAYwSvDS2qMg9RY4xh2Ug, length=4194304, seed=-8174703883422034556, readEarly=false, writeAndOverwrite=false, abortWrite=false]]"
          }
        ]
      }
    ],
    "type": "repository_verification_exception",
    "reason": "[transaction-log] analysis failed, you may need to manually remove [temp-analysis-r190NeMWQh-3xSzWrTdkJg]",
    "caused_by": {
      "type": "repository_verification_exception",
      "reason": "[transaction-log] register [test-register-9UxQpoW3T4CnILP7csWraw] should have value [10] but instead had value [OptionalBytesReference[MISSING]]",
      "suppressed": [
        {
          "type": "repository_verification_exception",
          "reason": "[transaction-log] failure processing [blob analysis [transaction-log: temp-analysis-r190NeMWQh-3xSzWrTdkJg/test-blob-20-XYRw2e1nSSepBrqBojm7ZQ, length=32768, seed=7696665789984602171, readEarly=false, writeAndOverwrite=false, abortWrite=false]]",
          "caused_by": {
            "type": "execution_cancelled_exception",
            "reason": "operation was cancelled reason [task failed]",
            "suppressed": [
              {
                "type": "repository_verification_exception",
                "reason": "[transaction-log] blob upload cancelled at position [0+32768/32768]"
              }
            ]
          }
        }
      ]
    }
  },
  "status": 500
}

Definitely suggests anomalous behaviour yes.

You don't necessarily need to upgrade your production cluster to 8.17, but it would be good if you could try the same tests using a standalone 8.17 node.

Ok, this is an Elastic forum, so we concentrate on elements relating to Elastic products/code/usage/etc. And your issue certainly lies in the interesting puzzle category, I’m curious what resolves the mystery.

BUT, if you are making un-restorable snapshots, and you have no other way to recover, IMO you need fix that first. Somehow. Be that using S3, or other cloud service, or use a different NFS appliance, or whatever, NOT having tested restorable backups of some sort is … suboptimal.

Couldn't agree more, tnx for your input though.

I'll keep you posted if there is new information.