Backup data in a robust way

Hi there,

What would be the best way to create backups of the data in ES? In my use
case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to this.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

Hey Robin,

I found this Gist: https://gist.github.com/1939828
https://gist.github.com/1939828

I never tested it myself.

Have a look also at: https://gist.github.com/4380715
https://gist.github.com/4380715
Never tested it also...

That said, I think that in the next versions, some tools will appears and
probably will cover this need.

HTH
David

Le 2 janvier 2013 à 10:35, Robin Verlangen robin@us2.nl a écrit :

Hi there,

What would be the best way to create backups of the data in ES? In my use
case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every node
(not so easy to coordinate), and store it on something like Amazon S3. However
I would like to know whether there are more solid solutions to this.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl http://www.robinverlangen.nl
E robin@us2.nl mailto:robin@us2.nl
http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that the
information remains the property of the sender. You must not use, disclose,
distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete
this message and any copies.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

Hi David,

Thank you for your reply. The second gist is about the same as I was
thinking about. Seems that I'll have to keep track of my data, or find it
through the API.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Wed, Jan 2, 2013 at 11:05 AM, David Pilato david@pilato.fr wrote:

**
Hey Robin,

I found this Gist: https://gist.github.com/1939828

I never tested it myself.

Have a look also at: https://gist.github.com/4380715
Never tested it also...

That said, I think that in the next versions, some tools will appears and
probably will cover this need.

HTH
David

Le 2 janvier 2013 à 10:35, Robin Verlangen robin@us2.nl a écrit :

Hi there,

What would be the best way to create backups of the data in ES? In my use
case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to
this.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

As David pointed out there will appear more tools for this in the upcoming
versions.

Until then you can try:

Regards,
Peter.

On Wednesday, January 2, 2013 2:52:40 PM UTC+1, Robin Verlangen wrote:

Hi David,

Thank you for your reply. The second gist is about the same as I was
thinking about. Seems that I'll have to keep track of my data, or find it
through the API.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl <javascript:>

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Wed, Jan 2, 2013 at 11:05 AM, David Pilato <da...@pilato.fr<javascript:>

wrote:

**
Hey Robin,

I found this Gist: https://gist.github.com/1939828

I never tested it myself.

Have a look also at: https://gist.github.com/4380715
Never tested it also...

That said, I think that in the next versions, some tools will appears
and probably will cover this need.

HTH
David

Le 2 janvier 2013 à 10:35, Robin Verlangen <ro...@us2.nl <javascript:>>
a écrit :

Hi there,

What would be the best way to create backups of the data in ES? In my
use case the data is append-only (log storage etc).

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to
this.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E ro...@us2.nl <javascript:>

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee and may
be confidential. If you are not the intended recipient, you are reminded
that the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

--
David Pilato
http://www.scrutmydocs.org/
http://dev.david.pilato.fr/
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

--

--

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to this.

That is essentially correct. You don't have to close the index; as others
pointed out, flush
[http://www.elasticsearch.org/guide/reference/api/admin-indices-flush.html]
the index, disable flush for the index
[http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html],
create the archive or rsync to the backup location, and enable flush when
that operation is done. You don't have to disable indexing etc. You don't
have to attempt to "synchronize" the gzipping.

In a near future, there will be a snapshot/restore API in Elasticsearch
itself.

Karel

--

Hello,

Below is yet another gist. I'm thinking the more, the better :slight_smile:

This is especially useful if you have time-based data, and you'd want to
optimize your indices before "archiving" them. This would make better use
of your disk space and your searches would be faster.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Sun, Jan 6, 2013 at 9:41 AM, Karel Minařík <
karel.minarik@elasticsearch.com> wrote:

I was thinking about closing the index, gzip the entire folder on every
node (not so easy to coordinate), and store it on something like Amazon S3.
However I would like to know whether there are more solid solutions to this.

That is essentially correct. You don't have to close the index; as others
pointed out, flush [
http://www.elasticsearch.org/guide/reference/api/admin-indices-flush.html]
the index, disable flush for the index [
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html],
create the archive or rsync to the backup location, and enable flush when
that operation is done. You don't have to disable indexing etc. You don't
have to attempt to "synchronize" the gzipping.

In a near future, there will be a snapshot/restore API in Elasticsearch
itself.

Karel

--

--

Just a disclaimer: while it is easy to package your Elasticsearch _source
documents with the knapsack plugin into a tar.gz archive, it is not
intended as a backup/restore solution and you can not expect it to work
correctly in all situations. Precautions should be taken. For example when
the index is "hot", the tool can't ensure to capture a consistent set of
docs.

Jörg

On Saturday, January 5, 2013 9:15:21 PM UTC+1, Karussell wrote:

--