ElasticSearch on AWS - disaster recovery / total cluster failure

utel · October 4, 2013, 2:57pm

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

3 x EC2 instances running elasticsearch
each EC2 instance has an attached long-lived EBS volume for the data
directory ( /var/lib/elasticsearch )
discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

create new EBS volumes from the snapshots
spin up new EC2 instances + attach EBS volumes
start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Igor_Motov · October 4, 2013, 6:09pm

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we are
planning to add a new feature to 1.0 that will make backup and restore
easier: Snapshot/Restore API - Phase I · Issue #3826 · elastic/elasticsearch · GitHub

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

3 x EC2 instances running elasticsearch

each EC2 instance has an attached long-lived EBS volume for the data
directory ( /var/lib/elasticsearch )

discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:

http://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html

http://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

create new EBS volumes from the snapshots

spin up new EC2 instances + attach EBS volumes

start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

utel · October 5, 2013, 8:26am

Thanks very much, Igor.

That's exactly what we would be after.

Is there a rough timeline for when 1.0 would be available?
If it's in the order of weeks, we'll look to use the new feature.
If it's in the order of months, we may use one of the methods described in
the earlier posts.

Thanks again.

On Fri, Oct 4, 2013 at 7:09 PM, Igor Motov imotov@gmail.com wrote:

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we are
planning to add a new feature to 1.0 that will make backup and restore
easier: Snapshot/Restore API - Phase I · Issue #3826 · elastic/elasticsearch · GitHub

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

3 x EC2 instances running elasticsearch

each EC2 instance has an attached long-lived EBS volume for the data
directory ( /var/lib/elasticsearch )

discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-
the-deprecation-of-S3-gateway-what-is-the-current-best-
approach-to-cluster-**persistence-td4028476.htmlhttp://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-
Backups-to-S3-td4030215.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

create new EBS volumes from the snapshots

spin up new EC2 instances + attach EBS volumes

start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Igor_Motov · October 6, 2013, 3:10pm

I think it's more likely the latter. However, the betas of 1.0 might
appear much sooner. So, if you are waiting for an official 1.0 release I
would, probably, advise to use one of the existing methods for now.

On Saturday, October 5, 2013 4:26:43 AM UTC-4, utel wrote:

Thanks very much, Igor.

That's exactly what we would be after.

Is there a rough timeline for when 1.0 would be available?
If it's in the order of weeks, we'll look to use the new feature.
If it's in the order of months, we may use one of the methods described in
the earlier posts.

Thanks again.

On Fri, Oct 4, 2013 at 7:09 PM, Igor Motov <imo...@gmail.com <javascript:>

wrote:

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we are
planning to add a new feature to 1.0 that will make backup and restore
easier: Snapshot/Restore API - Phase I · Issue #3826 · elastic/elasticsearch · GitHub

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

3 x EC2 instances running elasticsearch

each EC2 instance has an attached long-lived EBS volume for the data
directory ( /var/lib/elasticsearch )

discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-
the-deprecation-of-S3-gateway-what-is-the-current-best-
approach-to-cluster-**persistence-td4028476.htmlhttp://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-
Backups-to-S3-td4030215.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes periodically
(assuming, the more frequently the better) and for disaster recovery:

create new EBS volumes from the snapshots

spin up new EC2 instances + attach EBS volumes

start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

utel · October 7, 2013, 6:12am

Thanks for the clarifications on the timelines.

On Sun, Oct 6, 2013 at 4:10 PM, Igor Motov imotov@gmail.com wrote:

I think it's more likely the latter. However, the betas of 1.0 might
appear much sooner. So, if you are waiting for an official 1.0 release I
would, probably, advise to use one of the existing methods for now.

On Saturday, October 5, 2013 4:26:43 AM UTC-4, utel wrote:

Thanks very much, Igor.

That's exactly what we would be after.

Is there a rough timeline for when 1.0 would be available?
If it's in the order of weeks, we'll look to use the new feature.
If it's in the order of months, we may use one of the methods described
in the earlier posts.

Thanks again.

On Fri, Oct 4, 2013 at 7:09 PM, Igor Motov imo...@gmail.com wrote:

Yes, not much changed between 0.20.x and 0.90.5 in this regard but we
are planning to add a new feature to 1.0 that will make backup and restore
easier: https://github.com/**elasticsearch/elasticsearch/**issues/3826 https://github.com/elasticsearch/elasticsearch/issues/3826

On Friday, October 4, 2013 10:57:35 AM UTC-4, utel wrote:

Hello,

I have a few questions around recovering from total cluster failure and
would be grateful for any advice in this area.

We have a topology of:

3 x EC2 instances running elasticsearch

each EC2 instance has an attached long-lived EBS volume for the data
directory ( /var/lib/elasticsearch )

discovery (unicast) is via the elasticsearch-cloud-aws plugin

I've seen the following posts from earlier this year on how to handle
disaster recovery / total cluster failure:
http://elasticsearch-users.115913.n3.nabble.com/So-with-the-
deprecation-of-S3-gateway-what-is-the-current-best-approach-
to-cluster-**persistence-**td4028476.htmlhttp://elasticsearch-users.115913.n3.nabble.com/So-with-the-deprecation-of-S3-gateway-what-is-the-current-best-approach-to-cluster-persistence-td4028476.html
http://elasticsearch-users.115913.n3.nabble.com/Index-Backup
s-to-S3-td4030215.htmlhttp://elasticsearch-users.115913.n3.nabble.com/Index-Backups-to-S3-td4030215.html

In brief, the suggestion is to snapshot all the EBS volumes
periodically (assuming, the more frequently the better) and for disaster
recovery:

create new EBS volumes from the snapshots

spin up new EC2 instances + attach EBS volumes

start up elasticsearch

Is this still the recommended approach for DR / total cluster failure?
Or are there alternative strategies / improvements that have become
available since?
(those posts were coincident with elasticsearch 0.20.x and we're now at
0.90.5)

Also would anyone know of new features around backup and restore of the
entire cluster that are due in version 1.0?

Thanks.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**iWQgLK7wsac/unsubscribehttps://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/iWQgLK7wsac/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.