I study local gateway for two days, but still don't know how to use it
Can somebody explain it to me?
My question is :
1 where are data of local gateway put? how to generate data of gateway
2 how can I simulate a case to recover cluster based on data of local
gateway
Local gateway is the default, so you don't need to do anything special to
use it.
By default, your data is stored in /var/lib/elasticsearch/, if you use the
Debian package. You can change that by starting ES with something like
-Des.default.path.data=/my/data/path. If you use the Debian package, you
can simply change /etc/init.d/elasticsearch where it says:
Elasticsearch data directory
DATA_DIR=/var/lib/$NAME
Otherwise, you can add something like this to bin/elasticsearch.in.sh
JAVA_OPTS="$JAVA_OPTS -Des.default.path.data=/my/data/path"
On Thursday, October 11, 2012 11:30:48 AM UTC+3, bing wrote:
I study local gateway for two days, but still don't know how to use it
Can somebody explain it to me?
My question is :
1 where are data of local gateway put? how to generate data of gateway
2 how can I simulate a case to recover cluster based on data of local
gateway
Thanks Radu for the reply. Here are few thoughts of mine regarding backup and restore.
When for a given shard all replicas are gone (multiple node failure scenario), we lose the data of that shard and for sure, elasticsearch cluster health will turn into red from green.
So, we definitely need to have some sort of external backup of the data of each node.
A simple solution (considering AWS) would be attaching an EBS volume to each node automatically by cloudformation when the cluster spins up. Then do rsync from the data dir ( $ES_HOME/data) to the EBS volume. "rsync" will have two steps. In first step, we do rsync without disabling flush. In second step, we disable flush and do rsync again. The idea is that, the first step may take a while and so we dont want to disable flush for that long time, rather we want to rsync whatever we can. In the second step, rsync should not take that long and therefore its ok to disable flush. Then we take some periodical snapshots, lets say nightly snapshot of these EBS volumes to S3.
As per AWS mechanism, if an EBS volume was attached to a node by cloudformation, the EBS volume will be destroyed if the node is terminated by some reason (e.g. autoscaling) which is actually great for not flooding our AWS account with unused EBS volumes. So, by autoscaling a new node will come up and in the similar fashion a new EBS volume will be attached to the node by cloudformation.
For restore, we will use these EBS nightly snapshots to tackle failure scenarios such as "all replicas are gone for a given shard", "the whole cluster is gone", "data center is destroyed", etc. The restoration process is pretty straightforward, so I am not going into detail for that.
Hope Elasticsearch will come up with a robust backup solution really soon.
Or we convince the ES team to not deprecate the S3 gateway because it provides a much needed service perfectly fine for some users. I've been using it as a backup solution since 2010.
On Sat, 2013-03-30 at 05:44 -0700, James Cook wrote:
Or we convince the ES team to not deprecate the S3 gateway because it
provides a much needed service perfectly fine for some users. I've
been using it as a backup solution since 2010.
If you like that, then you'll love this
The backup solution that is coming will knock the socks off the existing
S3 gateway. You'll be pleased we did it.
A simple solution (considering AWS) would be attaching an EBS volume to
each
node automatically by cloudformation when the cluster spins up. (...) Then
we take some periodical snapshots, lets say nightly snapshot of these EBS
volumes to S3.
Just a note, with EBS volumes, it shouldn't be neccessary to disable flush,
as you can do a "point-in-time" snapshot.
As per AWS mechanism, if an EBS volume was attached to a node by
cloudformation, the EBS volume will be destroyed if the node is terminated
by some reason (e.g. autoscaling) which is actually great (...)
If you're OK with using EBS volumes for data persistence, it's surprising
to describe this behaviour as "great"; It's precisely the fact that they
survice instances which makes them attractive. You can create a snapshot
periodically as a backup, and when you want to restore, you just create a
volume from the snapshot. You can detach/attach the volume to different
instance, etc.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.