Backup and recovery of indexes

I'm setting up my first Elasticsearch cluster which is used for Logstash indexes, and have reached to figuring out backup and recovery.

Using snapshots is not an option for me, as none of the facilities it seems to require (shared filesystem or any kind of cloud) are available. I found some solutions online (one example: http://tech.superhappykittymeow.com/?p=296) which basically:

  • back up one day's logstash index directory with tar
  • read the mappings via ES API and store it in a restore script.

The restore script does the following:

  • creates new index using mappings that are saved during backup
  • extract the tar file which is created during backup
  • restarts Elasticsearch.

Strangely, this approach doesn't work for me - after restarting Elasticsearch it cannot read the restored index because some files in index directory are not found.

What I'm trying to understand at this point is not so much "why doesn't it work" but "why is it even done this way". Because I have found what does work for recovery Simply:

  • shut down Elasticsearch
  • extract the tar file created during backup
  • start Elasticsearch.

All the documents are there, and Kibana can successfully show the data. So I don't understand why separate steps for dealing with mappings are necessary. What trouble will I get myself into if I just keep tar'ing up the daily index directories?

That blog post is from 2011, which is a long time ago in ES land so there's likely to have been a number of changes around how we store things on disk.

You may end up with a corrupted restore if you are writing during backup, because your files are not consistent.