Elasticsearch snapshot partialy fails due to "index shard snapshot failed exception"

Kibana version: 7.4.2

Elasticsearch version: 7.4.2

Original install method (e.g. download page, yum, deb, from source, etc.) and version: all components were downloaded from official elastic page.

Fresh install or upgraded from other version? Upgraded from 7.3.2 and before it was 7.0.0

Is there anything special in your setup? 1x master node, 2x data nodes, 1x coordinating node, coordinating instance of elastic is installed on the same server as kibana, the rest are on separate machines .

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant): My goal is to create a backup for our system indices which contains all of our visualizations, dashboards and other stuff we created. We decide to pick snapshots as solution especially that in 7.4 snapshot lifecycle management was introduced. I created repository using this:

PUT /_snapshot/elk_backup
{
  "type": "fs",
  "settings": {
    "location": "nightly_snapshot"
  }
}

Then i created policy to take care of daily snapshoting indices:

PUT /_slm/policy/nightly-snapshots
{
  "schedule": "0 0 20 ? * MON-FRI", 
  "name": "<nightly-snap-{now/d}>", 
  "repository": "elk_backup", 
  "config": { 
    "indices": [".kibana*"] 
  }
}

Snapshot was created in the night but i got partial failure on one out of five indices. Indices which i want to snapshot are:
.kibana_1
.kibana_2
.kibana_3
.kibana_task_manager_1
.kibana_task_manager_2
Error i got was:

INTERNAL_SERVER_ERROR: IndexShardSnapshotFailedException[ElasticsearchException[failed to
create blob container]; nested: AccessDeniedException[/elastic_backup/nightly_snapshot/indices
/t2UIMX5kQaS2_B7nWGl8Kg/0];]; nested: ElasticsearchException[failed to create blob container]; 
nested: AccessDeniedException[/elastic_backup/nightly_snapshot/indices
/t2UIMX5kQaS2_B7nWGl8Kg/0]; 

To check if it is one time problem, I run the policy manualy and then i got the same error but for four indices. I checked permissions to directory and they are the same for every directory. In elasticsearch log I did not find any errors. I do not understand why some indices/shards failed and some did not. Why some of those indices give me access denied when making a snapshot?

Could you help me solve this problem?

if anything else is needed let me know.
Thank you in advance!

hi @Futerkowiec,

could you give us a little more insight into what kind of shared file system you have mounted at your snapshot location (/elastic_backup/nightly_snapshot/) please?
Is it a SMB share by any chance?

Thanks!

Hi @Armin_Braun,

Thanks for response. We are using NFS4.

If more information is needed let me know.

@Futerkowiec

This looks like it may be slowness and insufficient timeouts in your NFS configuration (just guessing) or some other transient issue with the NFS setup (those often bubble up as AccessDeniedException in Java). Could you check the dmesg (from the data nodes ) output around the time of the failing snapshot for NFS related errors (or paste it here if unsure)?

It looks like different machines have different UIDs, so if one node create a file other might not have permissions.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.