I'm working of new snapshot solution. I've read all documentation and searched for forum discussion and did not find what I was looking for.
So, I have elasticsearch cluster with tons of data and two types of indexes.
logstash- which are daily indices and have ~8gb of data and
aggregate- which are very small 100kb-1mb and have aggregated data from previews one. Because it's very important I dont want to lose those data. By now I have every day snapshot from all indices but it takes coupe of hours to get done, its hard to delete some data, and its taking a lot of space (cluster data ~2tb).
What I need?
logstash- indices need to be snapshoted. But lets say that I want have 6 last month in snapshots and 3 last months in ES to see graph from kibana, any older needs to be deleted. But all
aggregate- indices need to be in snapshots and last 6 month of them in ES. So I don't want all indices in snapshots, to spare space, I also don't need to have all data in ES, to spare machine disks.
So, I was thinking about having 1 snapshot per index. Let's say that
logstash-2018.08.23 will don't get any new data, so 24.08.2018 at 1:00 the snapshot is taken(
logstash-2018.08.23-backup) and only contains data from this one index. After that I aggregate that data and take same snapshot for
But I've read that it can slow down taking snapshot. So, by first I will split those indices to two seperate storage. One for
logstash- indices and second for
I wanted to ask, what impact will made take one snapshot per index on cluster, and time. It's easier to have snapshot per index, because when i want to delete some index data from storage I just delete
logstash-2018.08.23-backup, and all data for index
logstash-2018.08.23 is deleted, because any other snapshot does not contains information about this index.
Is there any other idea, how could I prepare solution for snapshot infrastructure. Any information will be helpful