Snapshots on NFS4 intermittently reporting PARTIAL

Hi, we have an intermittent issue with snapshots via fs on NFS4.
Once a day we close the repository for a backup to tape. After this backup has been taken we delete the data and create a new repository and an initial snapshot.
Then, we are taking snapshots every 15 min. Very few of these snapshots report a PARTIAL status for some 1-3 indices.
E.g.

IndexShardSnapshotFailedException[Failed to write commit point]; nested: FileSystemException[/backuprepo/indices/kEmnbUZJSBKVcNFMND95Hw/0/snap-CbacWLlcSw2xagCBh6GxQQ.dat: Input/output error]

or

IndexShardSnapshotFailedException[java.nio.file.DirectoryIteratorException: java.nio.file.FileSystemException: /data/backup-folder/smg-pp-b_backuprepo/indices/pK8Fc_wfS1-yVde-ccwrXw/0: Input/output error]; nested: DirectoryIteratorException[java.nio.file.FileSystemException: /data/backup-folder/smg-pp-b_backuprepo/indices/pK8Fc_wfS1-yVde-ccwrXw/0: Input/output error]; nested: FileSystemException[/backuprepo/indices/pK8Fc_wfS1-yVde-ccwrXw/0: Input/output error]; 

We also face other error messages.
The next snapshot report messages always report SUCCESS.

As I suspect a rare edge-case NFS4 bug being the root cause, my question is:
If a snapshot reports PARTIAL status and the next ones report SUCCESS, do these successful snapshots heal the affected repositor?. For better understanding: if an index is not snapshotted successfully, does the next successful snapshot fix the issue, so the backed up data is valid and consistent?

Thanks in advance

Technically yes, snapshots are logically independent so if a snapshot reports SUCCESS then it contains all the data you wanted, regardless of any other failed/partial snapshots. However Input/output error indicates something is fundamentally wrong with your storage, and therefore it may not be possible for Elasticsearch to read the data it just successfully wrote. I would not trust this repository with your data.

Thanks @DavidTurner . Your help is very much appreciated.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.