Since S3 Gateway has been officially deprecated, how should one maintain
cluster persistence while running on Amazon cloud?
I can think of
Once in a while, flush; disable_flush on a node.
rsync index data to backup folder
sync backup folder to S3 on background, to the folder named under the
node name.
Do that for every node
That way one can have the latest snapshot of the cluster backed up to S3 up
to the latest snapshot point.
But if I have 20 nodes in the cluster, restoring it from scratch will be a
lot of manual work.
Another way I can see:
Run on EBS
Periodically flush/disable_flush on a node
sync
create EBS snapshot
enable_flush
But still
Need to take care of older snapshots pruning
Resting still looks like manual pane.
So what is the advised practice of running multi-node cluster on AWS with
ability to recover from cluster sudden death?
Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?
Since S3 Gateway has been officially deprecated, how should one maintain
cluster persistence while running on Amazon cloud?
EBS :), preferably IOPS
Another way I can see:
I believe you don't have to disable flush with EBS snapshots, since they
should be point-in-time snapshots. Some documentation is at [1].
So what is the advised practice of running multi-node cluster on AWS with
ability to recover from cluster sudden death?
Create EBS snapshots in intervals which make economical sense to you
(hour,day,week?). For planning your disaster recovery plan, just
hard-terminate your nodes. Then, create new EBS volumes from your
snapshots, launch new EC2 instances, attach those volumes, voila cluster
should be fine.
It's a good idea to have a process like this automated, of course. See the
Elasticsearch Chef tutorial [2] for a detailed walktrough of one
possibility.
Need to take care of older snapshots pruning
With a good library such as Fog for Ruby [3], it's really easy to have it
all automated and nifty. There are many scripts on the internet for
inspiration: automate ebs snapshots fog - Google Search
Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?
The official Elasticsearch advice, and my own advice, based on personal
experience is don't do that. There are adventurous people who enjoy
some thrill, though
Karel, thank you for your definite points.
The path is clear now.
Zaar
On Monday, January 21, 2013 10:22:46 AM UTC+2, Karel Minařík wrote:
Since S3 Gateway has been officially deprecated, how should one maintain
cluster persistence while running on Amazon cloud?
EBS :), preferably IOPS
Another way I can see:
I believe you don't have to disable flush with EBS snapshots, since they
should be point-in-time snapshots. Some documentation is at [1].
So what is the advised practice of running multi-node cluster on AWS with
ability to recover from cluster sudden death?
Create EBS snapshots in intervals which make economical sense to you
(hour,day,week?). For planning your disaster recovery plan, just
hard-terminate your nodes. Then, create new EBS volumes from your
snapshots, launch new EC2 instances, attach those volumes, voila cluster
should be fine.
It's a good idea to have a process like this automated, of course. See the
Elasticsearch Chef tutorial [2] for a detailed walktrough of one
possibility.
Need to take care of older snapshots pruning
With a good library such as Fog for Ruby [3], it's really easy to have it
all automated and nifty. There are many scripts on the internet for
inspiration: automate ebs snapshots fog - Google Search
Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?
The official Elasticsearch advice, and my own advice, based on personal
experience is don't do that. There are adventurous people who enjoy
some thrill, though
Have you tried the "create EBS snapshot whenever, without any ES flushing
or pausing, and then recover ES from those EBS snapshots" approach yourself
or anyone you know and reeeeally trust?
Since S3 Gateway has been officially deprecated, how should one maintain
cluster persistence while running on Amazon cloud?
EBS :), preferably IOPS
Another way I can see:
I believe you don't have to disable flush with EBS snapshots, since they
should be point-in-time snapshots. Some documentation is at [1].
So what is the advised practice of running multi-node cluster on AWS with
ability to recover from cluster sudden death?
Create EBS snapshots in intervals which make economical sense to you
(hour,day,week?). For planning your disaster recovery plan, just
hard-terminate your nodes. Then, create new EBS volumes from your
snapshots, launch new EC2 instances, attach those volumes, voila cluster
should be fine.
It's a good idea to have a process like this automated, of course. See the
Elasticsearch Chef tutorial [2] for a detailed walktrough of one
possibility.
Need to take care of older snapshots pruning
With a good library such as Fog for Ruby [3], it's really easy to have it
all automated and nifty. There are many scripts on the internet for
inspiration: automate ebs snapshots fog - Google Search
Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?
The official Elasticsearch advice, and my own advice, based on personal
experience is don't do that. There are adventurous people who enjoy
some thrill, though
I believe you don't have to disable flush with EBS snapshots, since they should be point-in-time snapshots. Some documentation is at [1].
Have you tried the "create EBS snapshot whenever, without any ES flushing or pausing, and then recover ES from those EBS snapshots" approach yourself or anyone you know and reeeeally trust?
As said, I "believe" EBS snapshots are point-in-time. I have, in fact, succesfully recovered from an EBS snapshot without previously disabling flush on a running system. But since it is hard to simulate all the variables and possibilities in play, disabling flush seems like a sane precausion to me.
I believe you don't have to disable flush with EBS snapshots, since they should be point-in-time snapshots. Some documentation is at [1].
Have you tried the "create EBS snapshot whenever, without any ES flushing or pausing, and then recover ES from those EBS snapshots" approach yourself or anyone you know and reeeeally trust?
As said, I "believe" EBS snapshots are point-in-time. I have, in fact, succesfully recovered from an EBS snapshot without previously disabling flush on a running system. But since it is hard to simulate all the variables and possibilities in play, disabling flush seems like a sane precausion to me.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.