Are there any known metrics for index size vs snapshot repository size if we are only ever snapshotting one index like above? Let’s say for our discussion that the total size of all shards in a cluster is 50 GB. Does anyone have an idea for how to guestimate how much storage would be needed for such a snaphot repository?
We should add that I have the compress: True setting when registering the repository.
But still doesn't the repo hold a history of your index overtime, or at least for each snapshot? Say you are snapshotting/backing it up several times a day? Can the snapshot repo still equal the size of the index being backed up? It can do this without ever taking up much more space than your index?
Lets add a time element to the question. Lets say you have an index (same description from original post, plus registered with compression on) that grows from 50GB to 75GB over a period of a year. But lets also say we are snapshotting 3 times a day. At the end of that year, would I really only be using about 75 GB to store this?
And if there are no metrics that have been published then I'm really just asking for what people would guess based on experience. Is it really the case that with compression, and whatever Elastic does under the covers that this shouldn't be much bigger than the index size?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.