Backup data in a robust way

Robin_Verlangen · January 2, 2013, 9:35am

Hi there,

What would be the best way to create backups of the data in ES? In my use
case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to this.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

dadoonet · January 2, 2013, 10:05am

Hey Robin,

I found this Gist: Backup and restore an Elastic search index (shamelessly copied from http://tech.superhappykittymeow.com/?p=296) · GitHub
https://gist.github.com/1939828

I never tested it myself.

Have a look also at: Backup Elasticsearch node · GitHub
https://gist.github.com/4380715
Never tested it also...

That said, I think that in the next versions, some tools will appears and
probably will cover this need.

HTH
David

Le 2 janvier 2013 à 10:35, Robin Verlangen robin@us2.nl a écrit :

Hi there,

What would be the best way to create backups of the data in ES? In my use
case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every node
(not so easy to coordinate), and store it on something like Amazon S3. However
I would like to know whether there are more solid solutions to this.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl http://www.robinverlangen.nl
E robin@us2.nl mailto:robin@us2.nl
http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that the
information remains the property of the sender. You must not use, disclose,
distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete
this message and any copies.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Robin_Verlangen · January 2, 2013, 1:52pm

Hi David,

Thank you for your reply. The second gist is about the same as I was
thinking about. Seems that I'll have to keep track of my data, or find it
through the API.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Wed, Jan 2, 2013 at 11:05 AM, David Pilato david@pilato.fr wrote:

**
Hey Robin,

I found this Gist: Backup and restore an Elastic search index (shamelessly copied from http://tech.superhappykittymeow.com/?p=296) · GitHub

I never tested it myself.

Have a look also at: Backup Elasticsearch node · GitHub
Never tested it also...

That said, I think that in the next versions, some tools will appears and
probably will cover this need.

HTH
David

Le 2 janvier 2013 à 10:35, Robin Verlangen robin@us2.nl a écrit :

Hi there,

What would be the best way to create backups of the data in ES? In my use
case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to
this.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

Karussell_2 · January 5, 2013, 8:15pm

As David pointed out there will appear more tools for this in the upcoming
versions.

Until then you can try:

you need to flush (+ probably disable indexing), then rsync to your
backup folder. Backup ElasticSearch with rsync · GitHub
you need to flush, then snapshot via AWS
use my reindexing plugin to copy a certain index or only a subset of the
data into a completely different cluster (probably not efficient regarding
network IO as its using only gzipped+json not binary json etc)
GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic
then there is the this plugin
GitHub - jprante/elasticsearch-knapsack: Knapsack plugin is an import/export tool for Elasticsearch where you can do the same
probably a bit more efficient as it is tarred + zipped

Regards,
Peter.

On Wednesday, January 2, 2013 2:52:40 PM UTC+1, Robin Verlangen wrote:

Hi David,

Thank you for your reply. The second gist is about the same as I was
thinking about. Seems that I'll have to keep track of my data, or find it
through the API.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl <javascript:>

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Wed, Jan 2, 2013 at 11:05 AM, David Pilato <da...@pilato.fr<javascript:>

wrote:

**
Hey Robin,

I found this Gist: Backup and restore an Elastic search index (shamelessly copied from http://tech.superhappykittymeow.com/?p=296) · GitHub

I never tested it myself.

Have a look also at: Backup Elasticsearch node · GitHub
Never tested it also...

That said, I think that in the next versions, some tools will appears
and probably will cover this need.

HTH
David

Le 2 janvier 2013 à 10:35, Robin Verlangen <ro...@us2.nl <javascript:>>
a écrit :

Hi there,

What would be the best way to create backups of the data in ES? In my
use case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to
this.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nl <javascript:>

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee and may
be confidential. If you are not the intended recipient, you are reminded
that the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

Karel_Minarik_2 · January 6, 2013, 7:41am

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to this.

That is essentially correct. You don't have to close the index; as others
pointed out, flush
[http://www.elasticsearch.org/guide/reference/api/admin-indices-flush.html]
the index, disable flush for the index
[http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html],
create the archive or rsync to the backup location, and enable flush when
that operation is done. You don't have to disable indexing etc. You don't
have to attempt to "synchronize" the gzipping.

In a near future, there will be a snapshot/restore API in Elasticsearch
itself.

Karel

--

radu_gheorghe · January 7, 2013, 8:26am

Hello,

Below is yet another gist. I'm thinking the more, the better

gist.github.com

https://gist.github.com/radu-gheorghe/3180985

log_backup.bash

#!/usr/bin/env bash

###############FUNCTIONS############

function prepare {
    #optimize the index
    echo -n "Optimizing index $INDEX_NAME..."
    curl -XPOST "$ADDRESS/$INDEX_NAME/_optimize" 2>/dev/null| grep 'failed":0' >/dev/null
    if [ $? -eq 0 ]; then
        echo "done"

This file has been truncated. show original

log_restore.bash

#!/usr/bin/env bash

###############FUNCTIONS############

function get_arguments {
    for ARGUMENT in "$@"; do
        case "$ARGUMENT" in
            -h|--help)
                echo "This will restpre indices from backup tp Elasticsearch."
                echo

This file has been truncated. show original

This is especially useful if you have time-based data, and you'd want to
optimize your indices before "archiving" them. This would make better use
of your disk space and your searches would be faster.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Sun, Jan 6, 2013 at 9:41 AM, Karel Minařík <
karel.minarik@elasticsearch.com> wrote:

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to this.

That is essentially correct. You don't have to close the index; as others
pointed out, flush [
Elasticsearch Platform — Find real-time answers at scale | Elastic]
the index, disable flush for the index [
Elasticsearch Platform — Find real-time answers at scale | Elastic],
create the archive or rsync to the backup location, and enable flush when
that operation is done. You don't have to disable indexing etc. You don't
have to attempt to "synchronize" the gzipping.

In a near future, there will be a snapshot/restore API in Elasticsearch
itself.

Karel

--

--

jprante · January 7, 2013, 11:17am

Just a disclaimer: while it is easy to package your Elasticsearch _source
documents with the knapsack plugin into a tar.gz archive, it is not
intended as a backup/restore solution and you can not expect it to work
correctly in all situations. Precautions should be taken. For example when
the index is "hot", the tool can't ensure to capture a consistent set of
docs.

Jörg

On Saturday, January 5, 2013 9:15:21 PM UTC+1, Karussell wrote:

then there is the this plugin
GitHub - jprante/elasticsearch-knapsack: Knapsack plugin is an import/export tool for Elasticsearch where you can do the
same probably a bit more efficient as it is tarred + zipped

--

Topic		Replies	Views
Best way to backup Elasticsearch	2	837	July 5, 2017
Options to Backup data from ElasticSearch Elasticsearch	3	750	July 5, 2017
Best way to backup the elastic search data(Entire Nodes Folder) from one machine to another machine? Elasticsearch	4	3944	December 4, 2017
Creating and storing ES indices on S3 Elasticsearch	2	731	July 6, 2017
ElasticSearch 1.0 Manual Backup Elasticsearch	4	568	July 6, 2017

Backup data in a robust way

Best regards, Radu

Related topics

Best regards,
Radu