Hello, Hope you guys are doing well!
During the last few years, we have been using the Snapshot & Restore feature, over a Shared File System via SMB, and everything seemed to work well.
We did some restores earlier this year, using a repository named “Snapshots_2“ on “/snapshots2“, that resided in a TrueNas.
This month, we started to have a couple issues, that ill break down our actions, for better understanding.
- Earlier this month (august 2025), we upgraded to a new TrueNas, mantaining every single disk (500TB~ of data) from the previous TrueNas.
- For some unknown reason, once we reactivated this repository in the new TrueNas, elasticsearch would “verify“ the repository successfully, although it was not possible to read/restore any of the snapshots.
- We then started to debug this situation, and couldnt find a proper reason. We did recreate the Repository in the “Repositories feature“ of Snapshot & Restore and Rebooted the TrueNas.
- With that, when the Repository been recreated, elastic started itself a “Repository Clean up“ process that took aproximately 24h to finish.
- A team member started then the ILM and some snapshots been taken (1581 to be precise).
- Although, I’ve manage to find out that the older ones, were not accessible, and that we currently have 2 index-N under /snapshots2 folder. (1 related to older snapshots with 116750 snapshots, and one related to the “1581“ ones).
- We recreated again the Repository in the “Repositories feature“ of Snapshot & Restore, and we now are reading the older snapshots (under the 116750).
Those, the snapshots under the 116750 are the ones that are worrying us, because they are about 500TB of data, and are throwing some issues:
a. we can list the snapshots successfully with
GET /_snapshot/Snapshots_2/all-1hour-365days-2025.07.17-19:40-gmjbuqyyqtovppizh3whew?verbose=false
Answer:
{
"snapshots": [
{
"snapshot": "all-1hour-365days-2025.07.17-19:40-gmjbuqyyqtovppizh3whew",
"uuid": "abbFdVX9Qp-Gr7cimlai6Q",
"repository": "Snapshots_2",
"indices": [
".ds-winlogbeat-siem-ds-2025.07.16-000818"
],
"data_streams": [],
"state": "SUCCESS"
}
],
"total": 1,
"remaining": 0
}
b. we cannot restore, due to missing index:
Our POST:
/_snapshot/Snapshots_2/all-1hour-365days-2025.07.17-19:40-gmjbuqyyqtovppizh3whew/_restore
{
"indices": ".ds-winlogbeat-siem-ds-2025.07.16-000818",
"rename_pattern": "(.+)",
"rename_replacement": "restored-$1"
}
Answer:
{
"error": {
"root_cause": [
{
"type": "snapshot_missing_exception",
"reason": "[Snapshots_2:all-1hour-365days-2025.07.17-19:40-gmjbuqyyqtovppizh3whew/abbFdVX9Qp-Gr7cimlai6Q] is missing"
}
],
"type": "snapshot_missing_exception",
"reason": "[Snapshots_2:all-1hour-365days-2025.07.17-19:40-gmjbuqyyqtovppizh3whew/abbFdVX9Qp-Gr7cimlai6Q] is missing",
"caused_by": {
"type": "no_such_file_exception",
"reason": "/snapshots2/indices/A1EcVWcYSseOKpbBVdXzEg/meta-aVpiwJcBY7800yEqmipQ.dat"
}
},
"status": 404
}
c. in fact, we cannot find the /snapshots2/indices/A1EcVWcYSseOKpbBVdXzEg/meta-aVpiwJcBY7800yEqmipQ.dat on disk. But we can see the folder /snapshots2/indices/A1EcVWcYSseOKpbBVdXzEg/, and /0 and /1 subfolders with some snap*.dat and different files.
Now we are looking forward on identifying:
- What caused/is causing this behaviour?
- Is there any way to restore this snapshots/indices?
Also important to mention, that this indices, are related to a datastream named winlogbeat-siem-ds, and due to time effort, we didnt yet _verify_repository, but we are starting it shortly.
Any kind of help might be precious, as we really might need to do future restores.
Looking forward to hear from you guys, thank you so much.