JSON Backups

Hello -

I've seen various solutions to this, but am wondering what additional
solutions are out there. For DR purposes, we're looking into JSON backups
of the ES data on a regular basis. Using Tire, and the Scroll / Scan API
full backups can be accomplished as well as incremental / differentials if
you specify a time range.

The problem is, full backups are exceptionally slow. What are your
solutions for doing backups to a 'raw' format?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I would be very interested on this as well. Today we rely on the fact that
we rebuild the index again every morning, so we keep the last one around,
and in case of error we just flip the alias to the old one.

But, as we move to an incremental approach, the backup per se I'm not
worried, because we rely on the fact that we have replication, and to loose
data, something really bad has to happen on a very large number of nodes.

What I'm worried is the fact that we (developers) could push a bug into
prod that can hurt the index (say we mess things and change a lot of
documents wrong). I really would like to have the ability to have some sort
of delayed backup (24 hrs old) of my current index.

We are still spinning our heads on the best way of doing it here.

Regards

On Tuesday, February 5, 2013 3:37:07 PM UTC-5, bra...@infochimps.com wrote:

Hello -

I've seen various solutions to this, but am wondering what additional
solutions are out there. For DR purposes, we're looking into JSON backups
of the ES data on a regular basis. Using Tire, and the Scroll / Scan API
full backups can be accomplished as well as incremental / differentials if
you specify a time range.

The problem is, full backups are exceptionally slow. What are your
solutions for doing backups to a 'raw' format?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Note, afaik, the Elasticsearch team is working on a full-fledged
backup/restore solution, using internal structures of the cluster to
ensure reliability.

For my personal devel purposes, I'm using my knapsack plugin
https://github.com/jprante/elasticsearch-knapsack to pack JSON _source
fields into tar.gz archives. These archives can also be indexed again (I
use it to test various index settings and mappings and for
post-processing in other apps). But, while exporting documents, the
index must be silent (inactive).

"Full backups are exceptionally slow" - I'm puzzled, can you give some
numbers you observe? With Java API bulk indexing, I can go up to 5-6
MB/second on an average Intel Sandy Bridge single desktop PC node (some
thousand docs per second).

Jörg

Am 05.02.13 23:08, schrieb Vinicius Carvalho:

I would be very interested on this as well. Today we rely on the fact
that we rebuild the index again every morning, so we keep the last one
around, and in case of error we just flip the alias to the old one.

But, as we move to an incremental approach, the backup per se I'm not
worried, because we rely on the fact that we have replication, and to
loose data, something really bad has to happen on a very large number
of nodes.

What I'm worried is the fact that we (developers) could push a bug
into prod that can hurt the index (say we mess things and change a lot
of documents wrong). I really would like to have the ability to have
some sort of delayed backup (24 hrs old) of my current index.

We are still spinning our heads on the best way of doing it here.

Regards

On Tuesday, February 5, 2013 3:37:07 PM UTC-5, bra...@infochimps.com
wrote:

Hello -

I've seen various solutions to this, but am wondering what
additional solutions are out there.  For DR purposes, we're
looking into JSON backups of the ES data on a regular basis. Using
Tire, and the Scroll / Scan API full backups can be accomplished
as well as incremental / differentials if you specify a time range.

The problem is, full backups are exceptionally slow.  What are
your solutions for doing backups to a 'raw' format?

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Jörg -

I've looked at knapsack as well, unfortunately the indexes are not silent.
I'll look at it again to see if I can utilize it somehow.

Extremely slow may have been a little too strong; dumping ~8 million
documents to a GZipped JSON file took approximately 2.5 hours. This in
itself is not a problem, but as the number of documents grow full backups
will be come increasingly cumbersome.

Thanks,

On Tuesday, February 5, 2013 5:03:25 PM UTC-6, Jörg Prante wrote:

Note, afaik, the Elasticsearch team is working on a full-fledged
backup/restore solution, using internal structures of the cluster to
ensure reliability.

For my personal devel purposes, I'm using my knapsack plugin
https://github.com/jprante/elasticsearch-knapsack to pack JSON _source
fields into tar.gz archives. These archives can also be indexed again (I
use it to test various index settings and mappings and for
post-processing in other apps). But, while exporting documents, the
index must be silent (inactive).

"Full backups are exceptionally slow" - I'm puzzled, can you give some
numbers you observe? With Java API bulk indexing, I can go up to 5-6
MB/second on an average Intel Sandy Bridge single desktop PC node (some
thousand docs per second).

Jörg

Am 05.02.13 23:08, schrieb Vinicius Carvalho:

I would be very interested on this as well. Today we rely on the fact
that we rebuild the index again every morning, so we keep the last one
around, and in case of error we just flip the alias to the old one.

But, as we move to an incremental approach, the backup per se I'm not
worried, because we rely on the fact that we have replication, and to
loose data, something really bad has to happen on a very large number
of nodes.

What I'm worried is the fact that we (developers) could push a bug
into prod that can hurt the index (say we mess things and change a lot
of documents wrong). I really would like to have the ability to have
some sort of delayed backup (24 hrs old) of my current index.

We are still spinning our heads on the best way of doing it here.

Regards

On Tuesday, February 5, 2013 3:37:07 PM UTC-5, bra...@infochimps.com
wrote:

Hello - 

I've seen various solutions to this, but am wondering what 
additional solutions are out there.  For DR purposes, we're 
looking into JSON backups of the ES data on a regular basis. Using 
Tire, and the Scroll / Scan API full backups can be accomplished 
as well as incremental / differentials if you specify a time range. 

The problem is, full backups are exceptionally slow.  What are 
your solutions for doing backups to a 'raw' format? 

Thanks 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

1 Like