Backup repository size is much bigger than indices size

Faye · April 15, 2016, 7:28am

I have an Elasticsearch backup per hour, after about 370 times of backup (about 15 days) , my backup repository is more than 15G !!! But the total indices size is just about 500M !! Elasticsearch is incremental backup, but 15G VS 500M , the difference is so huge ! I wonder whether it is normal with so big different size between indices and backup repository ?
Is it caused by my frequent backup (hourly) ? I use the hourly backup in cluster 1 and hourly restore in cluster 2 to keep two ES clusters data same real time .

======
My Elasticsearch settings : 2 nodes , 12 shard/node , 2 indices , fs type of backup to store snapshots to NAS

in Elasticsearch data directory , the indices size :

node1 indices size

[root@esnode1 indices]$ du -sh
266M .

node2 indices size

[root@esnode2 indices]$ du -sh
238M .

in backup repository , the size :

[root@esnode1 backup]$ du -lh
114M ./backup/indices/index1/0
112M ./backup/indices/index1/5
114M ./backup/indices/index1/11
114M ./backup/indices/index1/10
111M ./backup/indices/index1/8
116M ./backup/indices/index1/4
120M ./backup/indices/index1/9
118M ./backup/indices/index1/3
114M ./backup/indices/index1/2
115M ./backup/indices/index1/7
115M ./backup/indices/index1/1
112M ./backup/indices/index1/6
1.4G ./backup/indices/index1
747M ./backup/indices/index2/0
1.6G ./backup/indices/index2/5
887M ./backup/indices/index2/11
743M ./backup/indices/index2/10
2.1G ./backup/indices/index2/8
801M ./backup/indices/index2/4
1.3G ./backup/indices/index2/9
878M ./backup/indices/index2/3
951M ./backup/indices/index2/2
1.2G ./backup/indices/index2/7
953M ./backup/indices/index2/1
943M ./backup/indices/index2/6
13G ./backup/indices/index2
15G ./backup/indices
15G ./backup
1.1M ./backuplogs
15G .

Christian_Dahlqvist · April 15, 2016, 7:54am

Snapshot and restore works at the segment level and is incremental in that it will only snapshot a segment once even if it is used in multiple snapshots. This is described quite well in this blog post. As segments merge, these new segments will also be backed up, and as there will be multiple segments that hold the same records, snapshotting is not incremental at the record level.

Faye · April 15, 2016, 8:40am

yes, I searched and read the blog you posted from other's topic before posting this one , My case does not touch segment merge ....
I'm just curious the backup repository size is so huge (15G) compared with the indices data (500M) . almost 1 G increasement every day , it will eat up my NAS soon .

Faye · April 15, 2016, 8:41am

Now that it is incremental backup , my backup size should be similar to indices size or at most double it since there are some snapshost record files

Christian_Dahlqvist · April 15, 2016, 9:04am

If you index into Elasticsearch, segments will automatically be merged in the background.

Faye · April 15, 2016, 11:04am

ah , I did not realize the segments merge will automatically happen as long as there is indexing to ES
Now that , I tried by following the "Merge" part in posted blog , after merge , when backup again , the repository size increases more than double . Now sure how often merge will happen when index into ES , if one index to ES , one merge happens, then if I backup , absolutely the size of repository will be very huge . so It is expected that a large of backup repository size , right ?
I am using the backup and restore for the data sync between our production system and disaster recovery system , backup/restore happens hourly , that is to say , I have to delete the previous snapshot by API termly , right ?

Christian_Dahlqvist · April 15, 2016, 12:13pm

Deleting old snapshots will remove segments that no snapshot longer refer to, and will reduce storage space. How many old snapshots do you need to keep? How far back in time do you need to be able to restore?

Faye · April 18, 2016, 5:53am

I think two months are enough from business consideration . So I can delete the old snapshots every two months.
Just want to confirm with you again

It is a normal result that the indexes and backup repository have a big different size (500G VS 15G) in my case, right ?
Some of redundant data in backup snapshots are caused by segment merge of Lucene , right ?
Thanks !

Christian_Dahlqvist · April 18, 2016, 6:05am

Yes, that is correct. If you are constantly indexing into the cluster, merging of segments will continously happen in the background and the same record will end up in multiple segments over time, resulting in a repository that is considerably larger than the index size.

Topic		Replies	Views
Elasticsearch Snapshot Elasticsearch	26	241	July 2, 2025
Elasticsearch Snapshot repository size estimates Elasticsearch	4	6467	October 8, 2019
Issues with snapshots Elasticsearch	6	386	May 1, 2023
Backup size ~twice as large as index size - normal behavior? Elasticsearch	3	791	July 5, 2017
Handling growing backup repository size greater than 10TB? Elasticsearch	1	732	July 5, 2017

Backup repository size is much bigger than indices size

node1 indices size

node2 indices size

in backup repository , the size :

Related topics