Snapshots for daily indices - best practice?

Hi There,

we are currently searching for the best way to backup our ES-Cluster. Maybe the pros could give us some advice.
Background:

  • ES-Cluster (6.2.x) running on Azure VMs
  • Repository target is a blob (using repository-azure)
  • Two types of daily indices:
    • Shortterm: We will delete them after 7 days
    • Longterm: We will delete them after ~3 years (maybe moving them into a archive-cluster after x month if they have an impact on the current cluster)
  • Main load will be from 12:00 to 22:00
  • If the Cluster is down we are able to buffer all new documents around 2 days
  • The Cluster is growing around 20 indeces each 500MB a day.
  • There is currently no usecase where we have to write into an old index
  • Also no usecase where we have to delete/update any document
  • We want to have a snapshot every hour

Our current plan is:

  • Calling a force merge on all yesterday indeces at night
  • Creating a snapshot including all indeces every hour and delete snapshots older than ?3? days
    Would this be a propper plan? Any advice? Should we add two repositorys for short/longterm?

The Question we got while thinking about the backups:
Is there a performance impact if we just snapshot all indeces? Lets say in one year we have 3650 longterm and 70 shortterm indeces but only 20 of them are actively indexed.
So there would be no need for the cluster to check 3700 indeces for the snapshot cause we already have the latest state in another snapshot.
Is the snapshot-api smart enough to know that lastChanged.Timestamp < lastSnapshot.Timestamp for the indeces? or it will compare all the old documents of this 3700 indeces for this snapshot?
Would it be better if we call the snapshot using {"indeces": "[list of all current day]"} and once per day a full snapshot into the same repository?

Best regards,
Andreas

For long-term retention you generally want to make sure you have a reasonably large average shard size as each shard comes with memory overhead. Please read this blog post for further details and guidance.

Having 20 daily indices of just 500MB of data seems excessive, and will most likely cause problems down the road. For long-term indices that are to be retained for 3 years using daily indices may therefore not be appropriate. It may be worthwhile switching to e.g. monthly indices for these.

Hi Christian,

thanks for your suggestions.
We will keep the montly indices in mind. Need to check how this will impact our queries...

Regards,
Andreas

Having lots of small indices are likely to result in worse query performance as well as more overhead.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.