If I want to focus on data log security, I currently do a daily backup to avoid any problems with the server. Can monthly backups cover data before server problems? or is there another way?
What composition of shards and indices should I use? if I'm in a situation at number one
The snapshot frequency will determine how much data you may lose if you need to revert to a snapshot. With a daily snapshot you can lose at most 1days worth of data. Taking snapshots monthly sounds very unusual unless you have very static data.
So lets say i have 10 Snapshots, 1 Snapshot a day. How do i restore all the data from the 10 days when i lose it? Do I have to restore every snapshot on its own and merge them together into 1 indices or is there a better solution? Because if i restore only Snapshot10 I will only get the Data from day 10 right?
Each snapshot by default contains the full data set in the cluster at the time it was created so you restore the snapshot corresponding to the point in time you want to go to.
Yes but as you said segments that have no changed will not be snapshotted again. I have time series data and data which is inserted will never be changed. So when I take a snapshot on day 10 it will only back up data from day 10 because the other 9 snapshots allready covered Day 1 to 9. In fact i observe this behavior in one of my setups. I have 5 Snapshots over the course of 5 Years of data. I restored Snapshot 5 and only got back the Data from Year 5.
So am I right that I have to restore all 5 Snapshots and merge them together into 1 index so that i get my original data back? Hope you understand what i am trying to say^^
A snapshot taken on day 10 will link to and reuse segments taken earlier, so still represent the full cluster content at that time. If you remove an older snapshot that first included a segment this will be retained as newer snapshots are referring to it. A segment will only be deleted once no snapshots rely on it.
And what if data from the index is deleted? Is there a good way to restore it?
I have a hot node which contains 2 Months worth of data. I take snapshots of that hot node. Old data gets deleted and new data is added. Now I take a second snapshot.
1st. Snapshot contains 2 Months of data
2nd. Snapshot contains new data and doesnt contain the old data.
So how can I restore all the data. Even the one that was already deleted at the time snapshot 2 was taken. My current method is to restore every snapshot on its own and reindex them into 1 index. This index than contains all the data. Is this the right apporach? Sorry for complicating things, but i think we both misunderstood our usecases. Hope everything is clear now.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.