In my bucket I set TTL to 365 days, so the old files would be removed after 1 year.
via
The index snapshot process is incremental. In the process of making the index snapshot Elasticsearch analyses the list of the index files that are already stored in the repository and copies only files that were created or changed since the last snapshot.
I'm afraid, if I'd be able to restore 'most recent' (younger that 1 year) snapshot, when the older files would be deleted.
Have You been testing, would the mechanism work, if some 1-year-old files would vanish?
Not sure You I was clear.
It is not the Elasticsearch s3 plugin that is going to remove the files
The files have Time-To-Live=365 days and they would be removed by the S3 filesystem (automatically - no process involved), regardless if they are related to other snapshot or not. that is why I need to check if the snapshot can restore 'it's part' from it's data.
It is important, to be able to restore 'today's snapshot' without 'yesterday files'
I've just look around, probably changing the S3 TTL (life-cycle) rule to auto-remove files matching pattern only (like indices/logstash-201*) should solve the problem.
I guess we do care about index, metadata* snapshot* files, but the indices/* that are not a part of a snapshot that we are restoring, could be missing, and still it would succeed.
I'm happy, because my guess was correct. Please take a look:
I do create a snapshot: curl -XPUT "localhost:9200/_snapshot/my_fs_repository/2016.05.09_evening?wait_for_completion=true"
Do remove a index (2016.05.01) curl -XDELETE 'http://localhost:9200/logstash-2016.05.01*?pretty'
Take another snapshot: curl -XPUT "localhost:9200/_snapshot/my_fs_repository/2016.05.09_evening2?wait_for_completion=true"
Pretend to remove the index from filesystem snapshot (not via API) mv /mnt/nfs/indices/logstash-2016.05.01/ /mnt/nfs/indices/logstash-2016.05.01_pseudoremove
The index content: ls /mnt/nfs/indices/logstash-2016.05.01_pseudoremove 0 1 2 3 4 snapshot-2016.05.08 snapshot-2016.05.08_clean snapshot-2016.05.08_clean2 snapshot-2016.05.08_clean3 snapshot-2016.05.09_evening
curl -XPOST "localhost:9200/*/_close"#do I always have to close indices on 'full snapshot restore?'
{"acknowledged":true}
The first snapshot restore would fail curl -XPOST "localhost:9200/_snapshot/my_fs_repository/2016.05.09_evening/_restore?pretty"
{
"error" : "SnapshotException[[my_fs_repository:2016.05.09_evening] failed to read metadata]; nested: FileNotFoundException[/mnt/nfs/indices/logstash-2016.05.01/snapshot-2016.05.09_evening (No such file or directory)]; ",
"status" : 500
}
But this one works curl -XPOST "localhost:9200/_snapshot/my_fs_repository/2016.05.09_evening2/_restore?pretty"
{
"accepted" : true
}
So with right file-delete policy it works without API delete call.
Probably some snapshot-metadata may be left as orphans - but this is something I do accept.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.