ElasticSearch on AWS - disaster recovery / total cluster failure


(utel) #1

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

  • 3 x EC2 instances running elasticsearch
  • each EC2 instance has an attached long-lived EBS volume for the data
    directory ( /var/lib/elasticsearch )
  • discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

  • create new EBS volumes from the snapshots
  • spin up new EC2 instances + attach EBS volumes
  • start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Igor Motov) #2

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we are
planning to add a new feature to 1.0 that will make backup and restore
easier: https://github.com/elasticsearch/elasticsearch/issues/3826

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

  • 3 x EC2 instances running elasticsearch
  • each EC2 instance has an attached long-lived EBS volume for the data
    directory ( /var/lib/elasticsearch )
  • discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:

http://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html

http://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

  • create new EBS volumes from the snapshots
  • spin up new EC2 instances + attach EBS volumes
  • start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(utel) #3

Thanks very much, Igor.

That's exactly what we would be after.

Is there a rough timeline for when 1.0 would be available?
If it's in the order of weeks, we'll look to use the new feature.
If it's in the order of months, we may use one of the methods described in
the earlier posts.

Thanks again.

On Fri, Oct 4, 2013 at 7:09 PM, Igor Motov imotov@gmail.com wrote:

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we are
planning to add a new feature to 1.0 that will make backup and restore
easier: https://github.com/elasticsearch/elasticsearch/issues/3826

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

  • 3 x EC2 instances running elasticsearch
  • each EC2 instance has an attached long-lived EBS volume for the data
    directory ( /var/lib/elasticsearch )
  • discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-
the-deprecation-of-S3-gateway-what-is-the-current-best-
approach-to-cluster-**persistence-td4028476.htmlhttp://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-
Backups-to-S3-td4030215.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

  • create new EBS volumes from the snapshots
  • spin up new EC2 instances + attach EBS volumes
  • start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Igor Motov) #4

I think it's more likely the latter. However, the betas of 1.0 might
appear much sooner. So, if you are waiting for an official 1.0 release I
would, probably, advise to use one of the existing methods for now.

On Saturday, October 5, 2013 4:26:43 AM UTC-4, utel wrote:

Thanks very much, Igor.

That's exactly what we would be after.

Is there a rough timeline for when 1.0 would be available?
If it's in the order of weeks, we'll look to use the new feature.
If it's in the order of months, we may use one of the methods described in
the earlier posts.

Thanks again.

On Fri, Oct 4, 2013 at 7:09 PM, Igor Motov <imo...@gmail.com <javascript:>

wrote:

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we are
planning to add a new feature to 1.0 that will make backup and restore
easier: https://github.com/elasticsearch/elasticsearch/issues/3826

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

  • 3 x EC2 instances running elasticsearch
  • each EC2 instance has an attached long-lived EBS volume for the data
    directory ( /var/lib/elasticsearch )
  • discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-
the-deprecation-of-S3-gateway-what-is-the-current-best-
approach-to-cluster-**persistence-td4028476.htmlhttp://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-
Backups-to-S3-td4030215.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

  • create new EBS volumes from the snapshots
  • spin up new EC2 instances + attach EBS volumes
  • start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(utel) #5

Thanks for the clarifications on the timelines.

On Sun, Oct 6, 2013 at 4:10 PM, Igor Motov imotov@gmail.com wrote:

I think it's more likely the latter. However, the betas of 1.0 might
appear much sooner. So, if you are waiting for an official 1.0 release I
would, probably, advise to use one of the existing methods for now.

On Saturday, October 5, 2013 4:26:43 AM UTC-4, utel wrote:

Thanks very much, Igor.

That's exactly what we would be after.

Is there a rough timeline for when 1.0 would be available?
If it's in the order of weeks, we'll look to use the new feature.
If it's in the order of months, we may use one of the methods described
in the earlier posts.

Thanks again.

On Fri, Oct 4, 2013 at 7:09 PM, Igor Motov imo...@gmail.com wrote:

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we
are planning to add a new feature to 1.0 that will make backup and restore
easier: https://github.com/**elasticsearch/elasticsearch/**issues/3826https://github.com/elasticsearch/elasticsearch/issues/3826

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

  • 3 x EC2 instances running elasticsearch
  • each EC2 instance has an attached long-lived EBS volume for the data
    directory ( /var/lib/elasticsearch )
  • discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-the-
deprecation-of-S3-gateway-what-is-the-current-best-approach-
to-cluster-**persistence-**td4028476.htmlhttp://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-Backup
s-to-S3-td4030215.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes
periodically (assuming, the more frequently the better) and for disaster
recovery:

  • create new EBS volumes from the snapshots
  • spin up new EC2 instances + attach EBS volumes
  • start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**iWQgLK7wsac/unsubscribehttps://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6