Curator: Snapshotting Yesterday's Index

dawiro · May 23, 2018, 12:49pm

Hi,
I'm using curator 5.5.1 and specifying tasks in yaml formatted config files. Would appreciate it if someone could explain to me how i can snapshot an index that has rolled over?

Regards,
D

theuntergeek · May 23, 2018, 3:39pm

It's far easier to snapshot all indices that have rolled over. Data isn't duplicated in snapshot repositories, especially of rolled over indices which do not change any more.

    filters:
    - filtertype: pattern
      kind: prefix
      value: metricbeat
    - filtertype: alias
      aliases: metricbeat
      exclude: True

Given indices named metricbeat-######, and a rollover alias of metricbeat, these filters will select all indices which have rolled over by excluding the index associated with the metricbeat alias.

If you are still determined to only snapshot the most recently rolled index, you could do this by adding the count filter to the above:

    - filtertype: count
      count: 1
      use_age: true
      source: creation_date

dawiro · May 25, 2018, 12:02pm

In the logging scenario today's index is the most important index so it makes sense to snapshot it periodically throughout the day

theuntergeek · May 25, 2018, 12:45pm

That is not what you originally asked. You asked how to snapshot the "rolled over" index.

To snapshot the current index:

    filters:
    - filtertype: alias
      aliases: metricbeat
      exclude: false

This will select only the index identified by the named alias.

theuntergeek · May 25, 2018, 12:49pm

Repeatedly snapshotting an active time-series index will result in a lot of data being duplicated in your repository, due to Elasticsearch's ongoing segment merging*. While this may be useful for data protection, it is also wiser to have a snapshot repository that contains indices that are no longer written to, and are thus unlikely to change. These indices will quite likely have been forcemerged as well, to reduce the segment count to few, or even 1 segment per shard.

* Segment merges are a fact of life with Lucene, the underlying technology behind Elasticsearch. While it is true that Snapshots are incremental in Elasticsearch, they are not incremental at the data level, but at the segment level. As new data comes in, segments merge and become new segments. Segments not present in the last snapshot will be copied to the next one, whether the data in them has been snapshotted already or not.

dawiro · May 29, 2018, 9:31am

Well @theuntergeek , your reply prompted me to provide more information that widened the scope of the discussion. If I'm backing up, incrementally, the active index I'll also need to back it up after it rolls. That was the original question...
If i change to size based rolling then I'll be able to get away from snapshotting the active index. That won't happen any time soon, however. And, I struggle to see how I can tolerate maintaining a 24-hour window of data loss of the data that is also the most important.

dawiro · May 29, 2018, 9:41am

@theuntergeek would this work as a filter?

    - filtertype: period
      period_type: relative
      source: name
      unit: days
      range_from: 0
      range_to: 1
      timestring: '%Y.%m.%d'

Basically, backup today's and yesterday's index.

theuntergeek · May 29, 2018, 1:13pm

Do you not have any replicas of your indices? While replicas are not backups, for many users, they are sufficient redundancy until a proper (i.e. the shards are forcemerged) snapshot can be taken.

theuntergeek · May 29, 2018, 1:18pm

Yes, that period filter should work as described. However, I still wouldn't put forcemerged indices together with incomplete ones. This is most frequently accomplished with two repositories: one for "live" data that is not forcemerged, and one for data that is complete. You can delete data from the "live" snapshot repository as soon as you have hit the point where you know it is in the "complete" repository.

This is a common approach to cover the bases where you want to have snapshots of live data, even multiples per day, but still want cleaner snapshots with few segments for indices which have been forcemerged and/or shrunk.

dawiro · May 29, 2018, 1:20pm

This is an interesting point. I'll probably do that. Don't think I'll be able to get management to accept relying on replicas until we can snapshot a completed index...

system · June 26, 2018, 1:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to create a snapshot only for previous day index using curator Elasticsearch curator	7	2844	July 24, 2019
Snapshot - Repository S3 (Strategy) Elasticsearch	7	1251	November 7, 2017
Snapshot Duration / Curator / Index Selection Elasticsearch	7	1438	July 6, 2017
Snapshot - Is it possible to create them based on repository indices? Elasticsearch	3	683	July 5, 2017
Snapshots of new indices Elasticsearch	2	290	July 12, 2018

Curator: Snapshotting Yesterday's Index

Related topics