S3 repository creation failing for cluster

Hi,

I am trying to register a s3 repository for my elasticsearch cluster which has 2 data nodes, 1 master and 1 client. The ES version is 6.1
When I issue the _snapshot command, I get the following error

{
"error": {
"root_cause": [
{
"type": "repository_verification_exception",
"reason": "[hcdd_stg_repository] [[_KD96f6KR6iS8qrGDTSXXA, 'RemoteTransportException[[dd-es-stg-node-1][xx.xxx.xx.xxx:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[hcdd_stg_repository] missing];'], [OAjFobxlRYeauUGZL2m21w, 'RemoteTransportException[[dd-es-stg-node-2][yy.yyy.yy.yy:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[hcdd_stg_repository] missing];']]"
}
],
"type": "repository_verification_exception",
"reason": "[hcdd_stg_repository] [[_KD96f6KR6iS8qrGDTSXXA, 'RemoteTransportException[[dd-es-stg-node-1][xx.xxx.xx.xxx:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[hcdd_stg_repository] missing];'], [OAjFobxlRYeauUGZL2m21w, 'RemoteTransportException[[dd-es-stg-node-2][yy.yyy.yy.yy:9300][internal:admin/repository/verify]]; nested: RepositoryMissingException[[hcdd_stg_repository] missing];']]"
},
"status": 500
}

I added DEBUG logging but did not find anything useful. I have tried putting and deleting files to s3 directly from my ec2 instances (ES nodes) without issues using AWS cli. This proves that permissions are not an issue. I have the IAM role created and assigned to these instances and my IAM policy looks like this:

{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::snaps.example.com/*"
]
}
],
"Version": "2012-10-17"
}

When I try registering the repository as read only, it succeeds.
What am I missing? Thanks for your help.

A restart of elasticsearch on all the cluster nodes fixed the issue. I think it has to do with the fact that the IAM role was created and assigned to these ec2 instances that were already running. Thought will provide an update if somebody else faces the same issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.