Backup Policies for ES

Hi Team,

I am trying to follow this backup script

I have some few questions regarding this

  1. We are using 5 clusters and each cluster having 3 nodes.
  2. whether i need to rsync the "data" directory alone . is it enough to
    take backup from any one of the node for each cluster. Also can i recover
    the cluster using
    that data directory when some data crashes occur..?
  3. In the above script they disabling flush to stop indexing data's/records
    to that node(Am i right ..?) What happens if any data's indexed at that
    time..?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Vijay,

On Wed, Jul 3, 2013 at 4:26 PM, Vijay Prabhakar vsubbaraj@sirahu.comwrote:

Hi Team,

I am trying to follow this backup script https://gist.github.com/**
karussell/1074906 https://gist.github.com/karussell/1074906

I have some few questions regarding this

  1. We are using 5 clusters and each cluster having 3 nodes.
  2. whether i need to rsync the "data" directory alone . is it enough to
    take backup from any one of the node for each cluster. Also can i recover
    the cluster using
    that data directory when some data crashes occur..?

Yes, although to be on the safe side, to do that I'd shut down all 3 nodes,
recover all 3 backups and restart the nodes. It's nice to have your data
consistent though the cluster.

  1. In the above script they disabling flush to stop indexing
    data's/records to that node(Am i right ..?) What happens if any data's
    indexed at that time..?

No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log
http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.

When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Radu,

Thanks for your info.

Regarding backup, we are using 3 nodes(2 nodes as replicas).
So its enough to take backup from any one of the node right ..?

index.number_of_shards: 1
index.number_of_replicas: 2

Thanks,
Vijay Prabhakar.

On Wednesday, July 3, 2013 7:55:39 PM UTC+5:30, Radu Gheorghe wrote:

Hello Vijay,

On Wed, Jul 3, 2013 at 4:26 PM, Vijay Prabhakar <vsub...@sirahu.com<javascript:>

wrote:

Hi Team,

I am trying to follow this backup script https://gist.github.com/**
karussell/1074906 https://gist.github.com/karussell/1074906

I have some few questions regarding this

  1. We are using 5 clusters and each cluster having 3 nodes.
  2. whether i need to rsync the "data" directory alone . is it enough to
    take backup from any one of the node for each cluster. Also can i recover
    the cluster using
    that data directory when some data crashes occur..?

Yes, although to be on the safe side, to do that I'd shut down all 3
nodes, recover all 3 backups and restart the nodes. It's nice to have your
data consistent though the cluster.

  1. In the above script they disabling flush to stop indexing
    data's/records to that node(Am i right ..?) What happens if any data's
    indexed at that time..?

No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log
http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.

When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello Vijay,

Yes, that should be enough, but I'd test the whole backup&recovery process
before relying on it :slight_smile:

On Thu, Jul 4, 2013 at 9:52 AM, Vijay Prabhakar vsubbaraj@sirahu.comwrote:

Hi Radu,

Thanks for your info.

Regarding backup, we are using 3 nodes(2 nodes as replicas).
So its enough to take backup from any one of the node right ..?

index.number_of_shards: 1
index.number_of_replicas: 2

Thanks,
Vijay Prabhakar.

On Wednesday, July 3, 2013 7:55:39 PM UTC+5:30, Radu Gheorghe wrote:

Hello Vijay,

On Wed, Jul 3, 2013 at 4:26 PM, Vijay Prabhakar vsub...@sirahu.comwrote:

Hi Team,

I am trying to follow this backup script https://gist.github.com/
karussell/1074906 https://gist.github.com/karussell/1074906

I have some few questions regarding this

  1. We are using 5 clusters and each cluster having 3 nodes.
  2. whether i need to rsync the "data" directory alone . is it enough to
    take backup from any one of the node for each cluster. Also can i recover
    the cluster using
    that data directory when some data crashes occur..?

Yes, although to be on the safe side, to do that I'd shut down all 3
nodes, recover all 3 backups and restart the nodes. It's nice to have your
data consistent though the cluster.

  1. In the above script they disabling flush to stop indexing
    data's/records to that node(Am i right ..?) What happens if any data's
    indexed at that time..?

No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log
http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.

When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.