Hello,
I'm working of new snapshot solution. I've read all documentation and searched for forum discussion and did not find what I was looking for.
So, I have elasticsearch cluster with tons of data and two types of indexes. logstash-
which are daily indices and have ~8gb of data and aggregate-
which are very small 100kb-1mb and have aggregated data from previews one. Because it's very important I dont want to lose those data. By now I have every day snapshot from all indices but it takes coupe of hours to get done, its hard to delete some data, and its taking a lot of space (cluster data ~2tb).
What I need? logstash-
indices need to be snapshoted. But lets say that I want have 6 last month in snapshots and 3 last months in ES to see graph from kibana, any older needs to be deleted. But all aggregate-
indices need to be in snapshots and last 6 month of them in ES. So I don't want all indices in snapshots, to spare space, I also don't need to have all data in ES, to spare machine disks.
So, I was thinking about having 1 snapshot per index. Let's say that logstash-2018.08.23
will don't get any new data, so 24.08.2018 at 1:00 the snapshot is taken( logstash-2018.08.23-backup
) and only contains data from this one index. After that I aggregate that data and take same snapshot for aggregate-2018.08.23
index.
But I've read that it can slow down taking snapshot. So, by first I will split those indices to two seperate storage. One for logstash-
indices and second for aggreggate-
.
I wanted to ask, what impact will made take one snapshot per index on cluster, and time. It's easier to have snapshot per index, because when i want to delete some index data from storage I just delete logstash-2018.08.23-backup
, and all data for index logstash-2018.08.23
is deleted, because any other snapshot does not contains information about this index.
Is there any other idea, how could I prepare solution for snapshot infrastructure. Any information will be helpful