I am currently in the process of trying to setup a backup strategy for our Elasticsearch cluster.
I understand there is backup/restore snapshot functionality available in Elasticsearch that can be used, however this would require pointing to a file share (AWS, Azure, local, etc) and I am trying to avoid using more storage if possible as this is only logging for a single platform.
My question therefore is would VM backups of the Elastic search cluster be able to be used for restore purposes in your opinion?
In other words if we keep VM backups and we need to look at logs for 4 months ago. Could we restore a single cluster node VM from that time period, re-IP it then connect Kibana to this cluster to view the logs?
I am trying to avoid using more storage if possible as the VMs themselves are backed up daily so we technically already have backups of the indices.
I do not consider this a valid strategy. The advantage of snapshots inside of Elasticsearch, that no matter among how many nodes your data is distributed and sharded, this will always be a single point-in-time snapshot of the data at that exact moment. Also while new indexing is happening, this point in time snapshot will remain, completely independent from new write operations.
I dont think you can cater for this in your vm backup strategy to get all of the data aligned.
I know that it is not really a support (or valid strategy perhaps) but I was rather thinking about this as a strategy to be able to retrieve old logs for viewing only, not actually restoring them.
This is why I thought this could have been an option. If I needed to view some logs from 120 days ago and I know the server holds 90 days worth of logs, I could restore a VM snapshot that contained logs including this date range.
Then if I connected Kibana to this VM am I not correctly in stating I would be able to view all indices stored on that VM? I am really only worried about being able to view logs older than 90 days for a requested reason.
For our purpose this would seemingly be better than taking further storage to backup each index daily.
I understand that Elasticsearch has a snapshot/restore capability but as advise I am never needing to restore the data just view it.
I think snapshots are the perfect solution for this, especially if space is an issue for you. Remember you can compress the snapshots too.
If for example you take daily snapshots of your indices, only the delta between those days is stored, it's not a full index snapshot each time. If you then needed an index from 120 days ago, you could restore to a different index name and search it via Kibana.
I think in theory your plan could work, but it's not clean. You'd also have to ensure that the node you restore, contains all the necessary shards of the index in question.
As I create daily indices there will be no delta so only the new index will need to be backed up.
The indices are around 10-13GB each so I suppose we wouldn't need too much extra storage to backup a months worth (around 500GB).
If I was to use the VM snapshot for reviewing old logs it should technically work as the snapshots are daily and only the current day index would have any changes (based on the VM snapshot time) so I would realistically only loose data from day the backup was taken.
Again I know this is not the supported solution but I was trying to see if I could avoid using more storage, as we already backup the VMs daily so I thought I might be able to use their backups to save on using more storage for a service that is already being backed up.
I suppose I could take the hit of the storage for a month (using the snapshots) and then back this up to tape for monthlies. The storage won't be too much for a months worth. Now to convince management to give the service another 500GB.
Thanks for the assistance though. I will go away and revisit, as I want to make sure I have the data backed up in a retrievable manner.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.