Curator: Snapshotting Yesterday's Index


#1

Hi,
I'm using curator 5.5.1 and specifying tasks in yaml formatted config files. Would appreciate it if someone could explain to me how i can snapshot an index that has rolled over?

Regards,
D


(Aaron Mildenstein) #2

It's far easier to snapshot all indices that have rolled over. Data isn't duplicated in snapshot repositories, especially of rolled over indices which do not change any more.

    filters:
    - filtertype: pattern
      kind: prefix
      value: metricbeat
    - filtertype: alias
      aliases: metricbeat
      exclude: True

Given indices named metricbeat-######, and a rollover alias of metricbeat, these filters will select all indices which have rolled over by excluding the index associated with the metricbeat alias.

If you are still determined to only snapshot the most recently rolled index, you could do this by adding the count filter to the above:

    - filtertype: count
      count: 1
      use_age: true
      source: creation_date

#3

In the logging scenario today's index is the most important index so it makes sense to snapshot it periodically throughout the day


(Aaron Mildenstein) #4

That is not what you originally asked. You asked how to snapshot the "rolled over" index.

To snapshot the current index:

    filters:
    - filtertype: alias
      aliases: metricbeat
      exclude: false

This will select only the index identified by the named alias.


(Aaron Mildenstein) #5

Repeatedly snapshotting an active time-series index will result in a lot of data being duplicated in your repository, due to Elasticsearch's ongoing segment merging*. While this may be useful for data protection, it is also wiser to have a snapshot repository that contains indices that are no longer written to, and are thus unlikely to change. These indices will quite likely have been forcemerged as well, to reduce the segment count to few, or even 1 segment per shard.

* Segment merges are a fact of life with Lucene, the underlying technology behind Elasticsearch. While it is true that Snapshots are incremental in Elasticsearch, they are not incremental at the data level, but at the segment level. As new data comes in, segments merge and become new segments. Segments not present in the last snapshot will be copied to the next one, whether the data in them has been snapshotted already or not.


#6

Well @theuntergeek , your reply prompted me to provide more information that widened the scope of the discussion. If I'm backing up, incrementally, the active index I'll also need to back it up after it rolls. That was the original question...
If i change to size based rolling then I'll be able to get away from snapshotting the active index. That won't happen any time soon, however. And, I struggle to see how I can tolerate maintaining a 24-hour window of data loss of the data that is also the most important.


#7

@theuntergeek would this work as a filter?

    - filtertype: period
      period_type: relative
      source: name
      unit: days
      range_from: 0
      range_to: 1
      timestring: '%Y.%m.%d'

Basically, backup today's and yesterday's index.


(Aaron Mildenstein) #8

Do you not have any replicas of your indices? While replicas are not backups, for many users, they are sufficient redundancy until a proper (i.e. the shards are forcemerged) snapshot can be taken.


(Aaron Mildenstein) #9

Yes, that period filter should work as described. However, I still wouldn't put forcemerged indices together with incomplete ones. This is most frequently accomplished with two repositories: one for "live" data that is not forcemerged, and one for data that is complete. You can delete data from the "live" snapshot repository as soon as you have hit the point where you know it is in the "complete" repository.

This is a common approach to cover the bases where you want to have snapshots of live data, even multiples per day, but still want cleaner snapshots with few segments for indices which have been forcemerged and/or shrunk.


#10

This is an interesting point. I'll probably do that. Don't think I'll be able to get management to accept relying on replicas until we can snapshot a completed index...


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.