How to backup/Archive closed ES indexes

Hi All,

I've checked on here for anything similar but didn't find anything. Hopefully you have some suggestions. And before I start I'm a Linux admin so that's how I'm thinking in my post :smile:.

My setup:

  • Standard ELK setup with syslog --> logstash (parsed) and stored in ES.
  • I keep 2 weeks of indexes open and searchable. The rest are left in my Directory and flushed/closed.
  • Daily index sizes are around 8GB.

As you can image this is starting to take up quite a bit of space so I want to move them off to an archive area for when they may be required in the future. I've already tested tar'ing up and gzip'ing a closed index and deleting the original directory, then unzip'ing and re-opening the index in ES and that works.

It's clean and quick but maybe not supported.

I've looked at archiving and backup but I have some questions:

  • it doesn't work on closed indexes (i read something about partial: true?).
  • What happens after you run the archiving? is the original index removed from the original location while the archive exists in your archive location? The aim is to move the indexes out of their current location to save space.
  • I'm not sure this is what I'm looking for.
  • I'm happy to provide more information.

I've probably got a few things wrong but I look forward to your comments.

Regards

By "archiving and backup" I'm going to assume you mean Elasticsearch's native snapshot/restore feature.

it doesn't work on closed indexes (i read something about partial: true?).

Yes, I believe it requires indexes to be open. However, indexes can't change while they're closed so if you grab a snapshot right before you close the index you should be fine.

Any particular reason you need to close the indexes at all? Again, if you're snapshotting them you can delete them right away. Recovering an index from a snapshot is slower than just reopening a closed index though.

What happens after you run the archiving? is the original index removed from the original location while the archive exists in your archive location? The aim is to move the indexes out of their current location to save space.

It's up to you, but once an index has been included in a snapshot you can delete it from the cluster.

1 Like

Thanks for your reply.

I need to keep a years worth of data available. The reason i close the indexes is because I'm under the impression (maybe wrongly) that leaving all those indexes searchable will fill up the memory? A years worth of data takes up quite a bit of space so I'll need to move some out to an archived area.

Yes, open indexes will eat some JVM heap. Closing old indexes makes good sense, but doing that and snapshotting them doesn't necessarily make sense.

Thanks @magnusbaeck. I opted to:

  • Close my indices after 14 days (scripted/cron)
  • I adapted a backup script from https://github.com/imperialwicket/elasticsearch-logstash-index-mgmt (great script, only adapted to remove the S3 option). It also creates a restore script so you can manage putting it back again if you need the data. tested and works very well.
  • Will use the script to archive off after 90 days.