How Elasticsearch snapshot works when segments are merged


Hi, my current understanding is that snapshot is incremental file/segement by file used by indices. When segments are merged even if no new data are indexed to indices, snapshot will captue the difference due to segment change.

  1. Does that mean it will copy duplicate data to snapshot repository?
  2. When I restore indices by those snapshot that contains duplicate segment, will it result in duplicate documents?

(David Pilato) #2
  1. Yes
  2. No. When you restore, only the right segments are restored not the old ones.


How does it know the right segment to restore?

Say snapshot1 contains segment1 and segment2. Then, segmengt1 and 2 are merged to segment3. Snapshot2 will contain segment3. If I restore based on those two snapshot, how doe it know which is the correct segment to use?

(David Pilato) #4

snapshot2 knows that segment3 is used and not segment 1 and 2. So if you restore Snapshot2, only segment 3 is restored. You don't restore 2 snapshots.

Think of a snapshot as a full backup.


Just want to make sure I understand it correctly. Each snapshot will take a copy of entire cluster but will only copy the delta between the latest and current snapshot. Does that mean we only need the most recent snapshot for backup and it is safe to delete all old snashot?

(David Pilato) #6

This is correct.

(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.