I have an Elasticsearch backup per hour, after about 370 times of backup (about 15 days) , my backup repository is more than 15G !!! But the total indices size is just about 500M !! Elasticsearch is incremental backup, but 15G VS 500M , the difference is so huge ! I wonder whether it is normal with so big different size between indices and backup repository ?
Is it caused by my frequent backup (hourly) ? I use the hourly backup in cluster 1 and hourly restore in cluster 2 to keep two ES clusters data same real time .
======
My Elasticsearch settings : 2 nodes , 12 shard/node , 2 indices , fs type of backup to store snapshots to NAS
in Elasticsearch data directory , the indices size :
Snapshot and restore works at the segment level and is incremental in that it will only snapshot a segment once even if it is used in multiple snapshots. This is described quite well in this blog post. As segments merge, these new segments will also be backed up, and as there will be multiple segments that hold the same records, snapshotting is not incremental at the record level.
yes, I searched and read the blog you posted from other's topic before posting this one , My case does not touch segment merge ....
I'm just curious the backup repository size is so huge (15G) compared with the indices data (500M) . almost 1 G increasement every day , it will eat up my NAS soon .
ah , I did not realize the segments merge will automatically happen as long as there is indexing to ES
Now that , I tried by following the "Merge" part in posted blog , after merge , when backup again , the repository size increases more than double . Now sure how often merge will happen when index into ES , if one index to ES , one merge happens, then if I backup , absolutely the size of repository will be very huge . so It is expected that a large of backup repository size , right ?
I am using the backup and restore for the data sync between our production system and disaster recovery system , backup/restore happens hourly , that is to say , I have to delete the previous snapshot by API termly , right ?
Deleting old snapshots will remove segments that no snapshot longer refer to, and will reduce storage space. How many old snapshots do you need to keep? How far back in time do you need to be able to restore?
Yes, that is correct. If you are constantly indexing into the cluster, merging of segments will continously happen in the background and the same record will end up in multiple segments over time, resulting in a repository that is considerably larger than the index size.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.