Our index design :
The alias "my_alias" is created for the two indexes "myindex_a_2016_0106" and "myindex_a_2016_0712"
Insert data : Only insert data into the index "myindex_a_2016_0106" in the first half year in 2016, only insert data into the index "myindex_a_2016_0712" in the second half year in 2016, in future , new indexes such as myindex_a_2017_**** will be created to insert data.
Query data : Query by the alias "my_alias" instead of any specific index.
Our pain :
As the ES indexes data increases, we have a longer and longer backup time, each backup will take about 30min , the backup schedule is hourly in order to keep the data sync between production env and disaster recovery env. So ES almost have been backing up in 50% time. What's more , we will have more indexes in future. The backup time will be more longer .
Possible solution :
I am considering the approach to decrease the backup time. Currently we back up all indexes, Is it feasible to only back up the current index "myindex_a_2016_0712", which is the only index to insert data. My understanding is the files under "myindex_a_2016_0106" should NOT be updated in the second half year since I have NOT inserted any data into it , But It was really be updated, see the following pastes. Some of files were updated in Aug and Oct!! Why were they updated in the second half year ? Because of the query by alias ?
How to decrease the backup time in my case ?
If only backup the current index , will it largely decrease the backup time in future ? My concern are there is updating by ES itself in "myindex_a_2016_0106" but we do not backup it. Will there be any data lost in disaster recovery env after restoring the snapshots? What's more, we will not restore every snapshot , just restore the latest one in a regular schedule.
Besides my above idea , Any recommend from your side to decrease the backup duration ?