So, with the deprecation of S3 gateway, what is the current best approach to cluster persistence?

Hello,

Since S3 Gateway has been officially deprecated, how should one maintain
cluster persistence while running on Amazon cloud?

I can think of

  1. Once in a while, flush; disable_flush on a node.
  2. rsync index data to backup folder
  3. sync backup folder to S3 on background, to the folder named under the
    node name.
  4. Do that for every node

That way one can have the latest snapshot of the cluster backed up to S3 up
to the latest snapshot point.
But if I have 20 nodes in the cluster, restoring it from scratch will be a
lot of manual work.

Another way I can see:

  1. Run on EBS
  2. Periodically flush/disable_flush on a node
  3. sync
  4. create EBS snapshot
  5. enable_flush

But still

  1. Need to take care of older snapshots pruning
  2. Resting still looks like manual pane.

So what is the advised practice of running multi-node cluster on AWS with
ability to recover from cluster sudden death?
Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?

Best regards,
Zaar

--

Since S3 Gateway has been officially deprecated, how should one maintain
cluster persistence while running on Amazon cloud?

EBS :), preferably IOPS

Another way I can see:

I believe you don't have to disable flush with EBS snapshots, since they
should be point-in-time snapshots. Some documentation is at [1].

So what is the advised practice of running multi-node cluster on AWS with

ability to recover from cluster sudden death?

Create EBS snapshots in intervals which make economical sense to you
(hour,day,week?). For planning your disaster recovery plan, just
hard-terminate your nodes. Then, create new EBS volumes from your
snapshots, launch new EC2 instances, attach those volumes, voila cluster
should be fine.

It's a good idea to have a process like this automated, of course. See the
Elasticsearch Chef tutorial [2] for a detailed walktrough of one
possibility.

Need to take care of older snapshots pruning

With a good library such as Fog for Ruby [3], it's really easy to have it
all automated and nifty. There are many scripts on the internet for
inspiration: https://www.google.com/search?q=automate+ebs+snapshots+fog&oq=automate+ebs+snapshots+fog

Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?

The official Elasticsearch advice, and my own advice, based on personal
experience is don't do that. There are adventurous people who enjoy
some thrill, though :slight_smile:

Karel

[1]


[2] http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html
[3] http://fog.io/about/getting_started.html

--

Karel, thank you for your definite points.
The path is clear now.

Zaar

On Monday, January 21, 2013 10:22:46 AM UTC+2, Karel Minařík wrote:

Since S3 Gateway has been officially deprecated, how should one maintain

cluster persistence while running on Amazon cloud?

EBS :), preferably IOPS

Another way I can see:

I believe you don't have to disable flush with EBS snapshots, since they
should be point-in-time snapshots. Some documentation is at [1].

So what is the advised practice of running multi-node cluster on AWS with

ability to recover from cluster sudden death?

Create EBS snapshots in intervals which make economical sense to you
(hour,day,week?). For planning your disaster recovery plan, just
hard-terminate your nodes. Then, create new EBS volumes from your
snapshots, launch new EC2 instances, attach those volumes, voila cluster
should be fine.

It's a good idea to have a process like this automated, of course. See the
Elasticsearch Chef tutorial [2] for a detailed walktrough of one
possibility.

Need to take care of older snapshots pruning

With a good library such as Fog for Ruby [3], it's really easy to have it
all automated and nifty. There are many scripts on the internet for
inspiration:
https://www.google.com/search?q=automate+ebs+snapshots+fog&oq=automate+ebs+snapshots+fog

Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?

The official Elasticsearch advice, and my own advice, based on personal
experience is don't do that. There are adventurous people who enjoy
some thrill, though :slight_smile:

Karel

[1]
http://stackoverflow.com/questions/6469556/amazon-ebs-snapshots-as-incremental-backups
[2]
http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html
[3] http://fog.io/about/getting_started.html

--

Hi Karel,

Have you tried the "create EBS snapshot whenever, without any ES flushing
or pausing, and then recover ES from those EBS snapshots" approach yourself
or anyone you know and reeeeally trust? :slight_smile:

Thanks,
Otis

Solr & ElasticSearch Support

On Mon, Jan 21, 2013 at 3:22 AM, Karel Minařík karel.minarik@gmail.comwrote:

Since S3 Gateway has been officially deprecated, how should one maintain

cluster persistence while running on Amazon cloud?

EBS :), preferably IOPS

Another way I can see:

I believe you don't have to disable flush with EBS snapshots, since they
should be point-in-time snapshots. Some documentation is at [1].

So what is the advised practice of running multi-node cluster on AWS with

ability to recover from cluster sudden death?

Create EBS snapshots in intervals which make economical sense to you
(hour,day,week?). For planning your disaster recovery plan, just
hard-terminate your nodes. Then, create new EBS volumes from your
snapshots, launch new EC2 instances, attach those volumes, voila cluster
should be fine.

It's a good idea to have a process like this automated, of course. See the
Elasticsearch Chef tutorial [2] for a detailed walktrough of one
possibility.

Need to take care of older snapshots pruning

With a good library such as Fog for Ruby [3], it's really easy to have it
all automated and nifty. There are many scripts on the internet for
inspiration:
https://www.google.com/search?q=automate+ebs+snapshots+fog&oq=automate+ebs+snapshots+fog

Can I still go with S3 gateway if I'll take particular precautions that
someone can outline?

The official Elasticsearch advice, and my own advice, based on personal
experience is don't do that. There are adventurous people who enjoy
some thrill, though :slight_smile:

Karel

[1]
http://stackoverflow.com/questions/6469556/amazon-ebs-snapshots-as-incremental-backups
[2]
http://www.elasticsearch.org/tutorials/2012/03/21/deploying-elasticsearch-with-chef-solo.html
[3] http://fog.io/about/getting_started.html

--

--

I believe you don't have to disable flush with EBS snapshots, since they should be point-in-time snapshots. Some documentation is at [1].
Have you tried the "create EBS snapshot whenever, without any ES flushing or pausing, and then recover ES from those EBS snapshots" approach yourself or anyone you know and reeeeally trust? :slight_smile:

As said, I "believe" EBS snapshots are point-in-time. I have, in fact, succesfully recovered from an EBS snapshot without previously disabling flush on a running system. But since it is hard to simulate all the variables and possibilities in play, disabling flush seems like a sane precausion to me.

Karel

--

Karel,
Do I have to also issue "flush" after "disable_flush" to make sure
that everything is in sync?

Zaar

On 22 בינו 2013, at 20:45, "Karel Minařík" karel.minarik@gmail.com wrote:

I believe you don't have to disable flush with EBS snapshots, since they should be point-in-time snapshots. Some documentation is at [1].
Have you tried the "create EBS snapshot whenever, without any ES flushing or pausing, and then recover ES from those EBS snapshots" approach yourself or anyone you know and reeeeally trust? :slight_smile:

As said, I "believe" EBS snapshots are point-in-time. I have, in fact, succesfully recovered from an EBS snapshot without previously disabling flush on a running system. But since it is hard to simulate all the variables and possibilities in play, disabling flush seems like a sane precausion to me.

Karel

--

--