Hi, my current understanding is that snapshot is incremental file/segement by file used by indices. When segments are merged even if no new data are indexed to indices, snapshot will captue the difference due to segment change.
Does that mean it will copy duplicate data to snapshot repository?
When I restore indices by those snapshot that contains duplicate segment, will it result in duplicate documents?
Say snapshot1 contains segment1 and segment2. Then, segmengt1 and 2 are merged to segment3. Snapshot2 will contain segment3. If I restore based on those two snapshot, how doe it know which is the correct segment to use?
Just want to make sure I understand it correctly. Each snapshot will take a copy of entire cluster but will only copy the delta between the latest and current snapshot. Does that mean we only need the most recent snapshot for backup and it is safe to delete all old snashot?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.