Backup Policies for ES

Vijay_Prabhakar · July 3, 2013, 1:26pm

Hi Team,

I am trying to follow this backup script

gist.github.com

https://gist.github.com/karussell/1074906

backup.sh

# TO_FOLDER=/something
# FROM=/your-es-installation

DATE=`date +%Y-%m-%d_%H-%M`
TO=$TO_FOLDER/$DATE/
echo "rsync from $FROM to $TO"
# the first times rsync can take a bit long - do not disable flusing
rsync -a $FROM $TO

# now disable flushing and do one manual flushing

This file has been truncated. show original

es-flush-disable.sh

# true or false
DISABLE=$1
curl -XPUT 'localhost:9200/_settings' -d '{
   "index" : {
      "translog.disable_flush" : "'$DISABLE'"
   }
}'

es-flush.sh

curl -XPOST 'localhost:9200/_flush'

There are more than three files. show original

I have some few questions regarding this

We are using 5 clusters and each cluster having 3 nodes.
whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?
In the above script they disabling flush to stop indexing data's/records
to that node(Am i right ..?) What happens if any data's indexed at that
time..?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

radu_gheorghe · July 3, 2013, 2:25pm

Hello Vijay,

On Wed, Jul 3, 2013 at 4:26 PM, Vijay Prabhakar vsubbaraj@sirahu.comwrote:

Hi Team,

I am trying to follow this backup script https://gist.github.com/**
karussell/1074906 https://gist.github.com/karussell/1074906

I have some few questions regarding this

We are using 5 clusters and each cluster having 3 nodes.

whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?

Yes, although to be on the safe side, to do that I'd shut down all 3 nodes,
recover all 3 backups and restart the nodes. It's nice to have your data
consistent though the cluster.

In the above script they disabling flush to stop indexing
data's/records to that node(Am i right ..?) What happens if any data's
indexed at that time..?

No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log
http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.

When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Vijay_Prabhakar · July 4, 2013, 6:52am

Hi Radu,

Thanks for your info.

Regarding backup, we are using 3 nodes(2 nodes as replicas).
So its enough to take backup from any one of the node right ..?

index.number_of_shards: 1
index.number_of_replicas: 2

Thanks,
Vijay Prabhakar.

On Wednesday, July 3, 2013 7:55:39 PM UTC+5:30, Radu Gheorghe wrote:

Hello Vijay,

On Wed, Jul 3, 2013 at 4:26 PM, Vijay Prabhakar <vsub...@sirahu.com<javascript:>

wrote:

Hi Team,

I am trying to follow this backup script https://gist.github.com/**
karussell/1074906 https://gist.github.com/karussell/1074906

I have some few questions regarding this

We are using 5 clusters and each cluster having 3 nodes.

whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?

Yes, although to be on the safe side, to do that I'd shut down all 3
nodes, recover all 3 backups and restart the nodes. It's nice to have your
data consistent though the cluster.

In the above script they disabling flush to stop indexing
data's/records to that node(Am i right ..?) What happens if any data's
indexed at that time..?

No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log
http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.

When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

radu_gheorghe · July 4, 2013, 6:57am

Hello Vijay,

Yes, that should be enough, but I'd test the whole backup&recovery process
before relying on it

On Thu, Jul 4, 2013 at 9:52 AM, Vijay Prabhakar vsubbaraj@sirahu.comwrote:

Hi Radu,

Thanks for your info.

Regarding backup, we are using 3 nodes(2 nodes as replicas).
So its enough to take backup from any one of the node right ..?

index.number_of_shards: 1
index.number_of_replicas: 2

Thanks,
Vijay Prabhakar.

On Wednesday, July 3, 2013 7:55:39 PM UTC+5:30, Radu Gheorghe wrote:

Hello Vijay,

On Wed, Jul 3, 2013 at 4:26 PM, Vijay Prabhakar vsub...@sirahu.comwrote:

Hi Team,

I am trying to follow this backup script https://gist.github.com/
karussell/1074906 https://gist.github.com/karussell/1074906

I have some few questions regarding this

We are using 5 clusters and each cluster having 3 nodes.

whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?

Yes, although to be on the safe side, to do that I'd shut down all 3
nodes, recover all 3 backups and restart the nodes. It's nice to have your
data consistent though the cluster.

In the above script they disabling flush to stop indexing
data's/records to that node(Am i right ..?) What happens if any data's
indexed at that time..?

No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log
http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.

When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Backing Up ES Elasticsearch	9	454	July 6, 2017
Backup procedure for ES nodes Elasticsearch	2	342	July 6, 2017
Daily cluster backup Elasticsearch	4	493	July 6, 2017
Local gateway backup coordination Elasticsearch	2	315	July 6, 2017
ES backups without using snapshots? Elasticsearch	5	1253	July 6, 2017

Backup Policies for ES

Best regards, Radu

Best regards, Radu

Best regards, Radu

Related topics

Best regards,
Radu

Best regards,
Radu

Best regards,
Radu