Restore from Backup and merge with newer data


#1

Hi all,

I've had this up on Stack Exchange for a few days and no response there so I'm hoping I can some help in here since it's a forum specific to Elastic Search!

I'm experimenting with Elasticsearch in relation to backups and restoring data.

I can back up data into a snapshot using curator no problems.

I then physically delete the files related to the index (to somewhat simulate a HD crash etc.)

I restart Elasticsearch and verify in Kibana that the data is no longer there.

If I then go to restore the latest snapshot I made; any data stored in Elasticsearch between that last snapshot and the time I do the restore is lost.

The restoration of a snapshot doesn't seem to merge with newer data in the indices and I can't find any references to this problem online but surely restoring a backup doesn't just throw out newer data and I must be missing something?

To summarise:

A sample of my snapshots in the backup directory:

snapshot-curator-20150830191221
snapshot-curator-20150901225612
snapshot-curator-20150902090327

which were generated by the following command:

curator snapshot --repository es_backup indices --all-indices

I then delete the files for an index of a specific day:

rm -rf /mnt/storage/var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-production-media-2015.09.02

Restart Elasticsearch (I initially didn't do this and the data was always still there, it seems the Java or machine buffer held onto the data and Elasticsearch didn't realise it was gone!)

Verify that that dates data is all gone in Kibana.

Close all indices:

curator close indices --all-indices

Restore the latest snapshot:

curl -XPOST http://localhost:9200/_snapshot/es_backup/curator-20150729133045/_restore

The deleted data is back when looking in Kibana but any data put into elasticsearch between the snapshot been taken and the time of the restore is gone.

e.g. Last snapshot taken at 10am. Restore at 1pm. Data from 10am to 1pm disappears after restore.

So what am I doing wrong? How do I do a restore with a merge of current newer data that has been stored in Elasticsearch since the previous snapshot was taken?

Thanks!
Iain


(Mark Walkom) #2

Snapshot is a point in time copy of the data. When you restore you restore things to that point in time.

There is currently no way to merge like this.


#3

Well I guess that makes sense when you think about the word "snapshot", but maybe not so useful when there's no merge with newer data.

What is the solution? Just back up the files themselves in "/elasticsearch/nodes/0/indices" etc.? using a regular backup solution and don't go through ElasticSearch itself?

Thanks,
Iain


(Colin Goodheart-Smithe) #4

You could restore the older data to a different index name and then use an alias which points to both indices to search. Backing up the files in "/elasticsearch/nodes/0/indices" would not help you here as you still wouldn't be able to merge the backed up segments with the new segments. It is also very risky to backup using a file copy ooperation as while you are indexing the segments are constantly being added, merged and deleted and they is a chance that during your backup you will lose segments and corrupt your backup.


#5

Yeah, I can appreciate that essentially just copying and pasting files mightn't be the best idea. :slight_smile:

I guess replicating the data over multiple indexes or nodes might be the way to go. Thanks.


(system) #6