Unknown issue with Elastic snapshots

I will provide a bit of backstory information here as I think it could be relevant. I run several ec2 Elastic clusters in AWS, all being backed up by snapshot to aws s3. I have recently created two new clusters in us-east-2. The snap s3 buckets were origionally created in the wrong region by accident. I attempted to add a snap repo to cluster1 and did not realize the bucket was in the wrong region until I was getting a transport error (we use s3 endpoint inside vpc).

I opened an internal ticket and had buckets deleted and re-created, approx 3 days ago. Now I am getting a very strange error that I have never seen from cluster1.

{
    "error": {
        "root_cause": [
            {
                "type": "repository_verification_exception",
                "reason": "[qm-na6] path  is not accessible on master node"
            }
        ],
        "type": "repository_verification_exception",
        "reason": "[qm-na6] path  is not accessible on master node",
        "caused_by": {
            "type": "i_o_exception",
            "reason": "Unable to upload object [tests-vxHMJk-aTeGhpFXN_PREEQ/master.dat] using a single upload",
            "caused_by": {
                "type": "amazon_s3_exception",
                "reason": "amazon_s3_exception: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'us-east-2' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: 1XS6JH52NW4BWK6T; S3 Extended Request ID: ZEh5U6D7U0kUd5GGiWnngLU5xmcbIAwHOF7wWSaYxLJSNJ8/w5n7K19bAlykLAXmeFdeE8VYlfw=)"
            }
        }
    },
    "status": 500
}

All of my ec2/elastic cluster setup is controlled by terraform/ansible, they are all using consistant configuration/IAM Role to allow permission to s3 bucket. In fact, Cluster2 is using the exact same IAM role and works without issue.

{
    "Statement": [
        {
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:ListBucketVersions"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::bucket1",
                "arn:aws:s3:::bucket2"
            ]
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::bucket1/*",
                "arn:aws:s3:::bucket2/*"
            ]
        }
    ],
    "Version": "2012-10-17"
}

It seems as if some setting, or something may have been "cached" somewhere in elastic config? I use Kibana to create the snapshot repo and validate, with no luck. I have also tried specifying the region manually in the snap repo, which normally I do not have to do, but still no luck.

Here is error when manually specifying region:

{
    "error": {
        "root_cause": [
            {
                "type": "repository_verification_exception",
                "reason": "[qm-na6] [[91KomCdaRN63dj6PItTSLg, 'RemoteTransportException[[REDACTED.net][10.125.52.89:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];'], [QO5rq9A0SqGeOCaLgiqtbA, 'RemoteTransportException[[REDACTED.net][10.125.52.96:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];'], [ow39ZlVLQyaryKluekPBGA, 'RemoteTransportException[[REDACTED.net][10.125.48.159:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];'], [GdeKk7HTSSKQ3GpSvWPTig, 'RemoteTransportException[[REDACTED.net][10.125.48.250:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];']]"
            }
        ],
        "type": "repository_verification_exception",
        "reason": "[qm-na6] [[91KomCdaRN63dj6PItTSLg, 'RemoteTransportException[[REDACTED.net][10.125.52.89:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];'], [QO5rq9A0SqGeOCaLgiqtbA, 'RemoteTransportException[[REDACTED.net][10.125.52.96:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];'], [ow39ZlVLQyaryKluekPBGA, 'RemoteTransportException[[REDACTED.net][10.125.48.159:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];'], [GdeKk7HTSSKQ3GpSvWPTig, 'RemoteTransportException[[REDACTED.net][10.125.48.250:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[qm-na6] missing];']]"
    },
    "status": 500
}

Here is snap repo with Region added.

{

"type": "s3",
        "settings": {
            "bucket": "qm-elastic-snapshot-na6",
            "canned_acl": "bucket-owner-full-control",
            "storage_class": "standard_ia",
            "region": "us-east-2"
           
        }
    }

I have found the issue. After a full cluster restart I was getting the same error, RemoteTransportException. I did some more searching and it lead me to check plugins. I found that part of my nodes were missing the repository-s3 plugin.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.