This is not a bug but more targeted at initiating some kind of discussion around the space used up by snapshots.
My issue is that snapshots take so much space that snapshot space has become the most expensive part of the stack. If I gzip a snapshot folder I see it go down to 10% of the initial size so I'm sure there's much to be gained here. Of course these numbers are specific to my use case but I guess they would be comparable for most.
There are some workarounds where one could "enhance" the snapshot mechanism from the outside to achieve this, I'll name the few I've found online but none are officially supported or suggested so one should tread light:
- Zip the data folders on your own (http://tech.superhappykittymeow.com/?p=296)
- Roll snapshot repositories periodically and zip those (https://blog.jixee.me/elasticsearch-archiving-indexes-on-a-budget/)
- Use a file system that compresses data on the fly like ZFS
There are a couple of Elasticsearch settings that you thing they'd help but they really don't:
- Using the index compression codec feature (https://www.elastic.co/blog/store-compression-in-lucene-and-elasticsearch). This actually only compresses the docs and not the indexes around those which "usually" take up the most space so it only offers small gains.
- Snapshot repository compress setting (https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-snapshots.html). This as the documentation states only compresses metadata files so again there are little gains.
So is this a problem for other people too? Is there some official plan to save such disk space in the future? Am I missing something that could help me now?