Gateway snapshots and thinking of a DR site

Is there a way to have 2 Gateways in ElasticSearch? Say, a local FS
Gateway for normal operations standard redundancy, plus, say, an S3 Gateway
for DR-style purposes.. ?

I'm guessing many people use ES to index database content, so wondering for
those people, who presumably (hopefully) have a DR strategy, how they're
ensuring their ES DR instance as close as possible ready to go to match the
DB that's in their DR too..?

I'd love to have someway of doing a nightly snapshot to, say, S3 or some
other remote location different from the standard gateway to give some
point-in-time recovery options.

any ideas?

Paul

There isn't a concept of 2 gateways, and (at least currently) there isn't a plan to do it because I would like to solve it in a different manner.

There are two aspects that are confused here. Backups (and recovering from them), in which case, backing up the data directory for each node works well for local gateway.

Another is multi site replication. This is solved today using messaging between the two sites and applying the same changes on two different ES clusters, and in the future, planned to be a built in option in ES.
On Tuesday, May 3, 2011 at 8:54 AM, Paul Smith wrote:
Is there a way to have 2 Gateways in Elasticsearch? Say, a local FS Gateway for normal operations standard redundancy, plus, say, an S3 Gateway for DR-style purposes.. ?

I'm guessing many people use ES to index database content, so wondering for those people, who presumably (hopefully) have a DR strategy, how they're ensuring their ES DR instance as close as possible ready to go to match the DB that's in their DR too..?

I'd love to have someway of doing a nightly snapshot to, say, S3 or some other remote location different from the standard gateway to give some point-in-time recovery options.

any ideas?

Paul

On Wednesday, 4 May 2011, Shay Banon <

There are two aspects that are confused here. Backups (and recovering from them), in which case, backing up the data directory for each node works well for local gateway.

So with a local gateway there is no snapshotting available as I
understand from reading the docs. Is there a way to create a
quiescent period where one can guarantee the index on disk is flushed
and valid ready for copying? One needs to be able to create a valid
point in time snapshot of all shards before resuming on disk updates.

I think the same is required for the shared fs gateway option too.
This snapshots periodically for local data center redundancy but if
one needs to replicate this state to another DC location we need to be
able to pause any new snapshoting to get a quick valid directory tree
ready for replication.

This could be done either with a simple directory tree hard link
recurssion that Doug Cutting proposed way back near Lucene 1.4 if I
recall. Or if one has a SAN with Filesystem snapshotting then one can
use that.

Still needs a quiet period to ensure files are fsync'd and logically
consistent with respect to the index contents.

In the case of the local gateway would setting the flush frequency to
0 give this quiescent period for these sorts of operations? And then
resume flushing to continue the writes.

Another is multi site replication. This is solved today using messaging between the two sites and applying the same changes on two different ES clusters, and in the future, planned to be a built in option in.

That will be awesome to have it builtin. However with any replication
strategy, planning for failure is a must. Imagine a replication stream
breaks somehow for whatever crazy reason, just total worst case say
the DR site blows up and one is forced to create it new. . One would
need to take a valid primary copy of the index to rebuild the DR and
get the replication stream going again.

In this scenario you need the same ability as above if you want to
keep the primary going still. Obviously shutting down ES is an option
here too but...

Has anyone had to replicate a large index to another cluster while the
primary is still up?

Paul Smith

Heya,

Regarding the copying option, yes, you can set teh translog settings (they can be changed dynamically now in 0.16) to high values, and then do the copying over (maybe flush before). This will make sure that no new Lucene commit point is written to disk.

As for cross (slow connection) DCs, the solution will have to handle complete failures, and the ability to actually copy over the whole indices back to the original data center. This will be handled automatically (in a similar fashion that a replica can recover from a primary shard when its allocated).

-shay.banon
On Wednesday, May 4, 2011 at 2:19 AM, Paul Smith wrote:

On Wednesday, 4 May 2011, Shay Banon <

There are two aspects that are confused here. Backups (and recovering from them), in which case, backing up the data directory for each node works well for local gateway.

So with a local gateway there is no snapshotting available as I
understand from reading the docs. Is there a way to create a
quiescent period where one can guarantee the index on disk is flushed
and valid ready for copying? One needs to be able to create a valid
point in time snapshot of all shards before resuming on disk updates.

I think the same is required for the shared fs gateway option too.
This snapshots periodically for local data center redundancy but if
one needs to replicate this state to another DC location we need to be
able to pause any new snapshoting to get a quick valid directory tree
ready for replication.

This could be done either with a simple directory tree hard link
recurssion that Doug Cutting proposed way back near Lucene 1.4 if I
recall. Or if one has a SAN with Filesystem snapshotting then one can
use that.

Still needs a quiet period to ensure files are fsync'd and logically
consistent with respect to the index contents.

In the case of the local gateway would setting the flush frequency to
0 give this quiescent period for these sorts of operations? And then
resume flushing to continue the writes.

Another is multi site replication. This is solved today using messaging between the two sites and applying the same changes on two different ES clusters, and in the future, planned to be a built in option in.

That will be awesome to have it builtin. However with any replication
strategy, planning for failure is a must. Imagine a replication stream
breaks somehow for whatever crazy reason, just total worst case say
the DR site blows up and one is forced to create it new. . One would
need to take a valid primary copy of the index to rebuild the DR and
get the replication stream going again.

In this scenario you need the same ability as above if you want to
keep the primary going still. Obviously shutting down ES is an option
here too but...

Has anyone had to replicate a large index to another cluster while the
primary is still up?

Paul Smith

On 5 May 2011 06:13, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Regarding the copying option, yes, you can set teh translog settings
(they can be changed dynamically now in 0.16) to high values, and then do
the copying over (maybe flush before). This will make sure that no new
Lucene commit point is written to disk.

Excellent! So the 3 translog settings, if they are all changed to some
ridiculously large number to ensure none of the conditions triggers a flush,
and then a manual flush API call is done, it is safe to replicate the data
directory to another location, and then reset the settings.

As for cross (slow connection) DCs, the solution will have to handle
complete failures, and the ability to actually copy over the whole indices
back to the original data center. This will be handled automatically (in a
similar fashion that a replica can recover from a primary shard when its
allocated).

That's one hell of a feature, can't wait. We have 1 DC in the Middle East,
with the DR currently in Sydney, and unfortunately the network link is
unreliable, high latency, generally a PITA. (we plan to address that with
another location, but it's a factor in our interim rollout).

Many thanks for taking the time to answer this, and so many other questions.
You're a machine to be able to keep this community supported AND all that
awesome development all at the same time. really appreciate it.

Paul

On Thursday, May 5, 2011 at 2:08 AM, Paul Smith wrote:

On 5 May 2011 06:13, Shay Banon shay.banon@elasticsearch.com wrote:

Heya,

Regarding the copying option, yes, you can set teh translog settings (they can be changed dynamically now in 0.16) to high values, and then do the copying over (maybe flush before). This will make sure that no new Lucene commit point is written to disk.

Excellent! So the 3 translog settings, if they are all changed to some ridiculously large number to ensure none of the conditions triggers a flush, and then a manual flush API call is done, it is safe to replicate the data directory to another location, and then reset the settings.
Yes. It can be simplified, we can add a disable flush flag that can be set dynamically. Open an issue?

As for cross (slow connection) DCs, the solution will have to handle complete failures, and the ability to actually copy over the whole indices back to the original data center. This will be handled automatically (in a similar fashion that a replica can recover from a primary shard when its allocated).

That's one hell of a feature, can't wait. We have 1 DC in the Middle East, with the DR currently in Sydney, and unfortunately the network link is unreliable, high latency, generally a PITA. (we plan to address that with another location, but it's a factor in our interim rollout).
Thats a big one :). I first going to start with support for single cluster across fast connected site (and having smart shard allocation that takes it into account).

Many thanks for taking the time to answer this, and so many other questions. You're a machine to be able to keep this community supported AND all that awesome development all at the same time. really appreciate it.
Thanks!

Paul

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

thanks!

Paul

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul

Which replica are you referring to? The replica "cluster"? What is ec2 storage, ebs/s3?
On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul

Oh I meant s3 instead of ec2. To rephrase my question would it be
possible to use an es cluster with local storage, which updates the
replications of the data on the s3 storage, so that in the event that
something goes wrong with the local storage one can recover
transparently from the s3 storage, creating a new local storage.

On Thu, May 5, 2011 at 11:36 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Which replica are you referring to? The replica "cluster"? What is ec2
storage, ebs/s3?

On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul

The shared s3 gateway can be the main store data for the primary cluster. And if it fails, you can start another cluster which will use that data. The problem there is that it will take time to download all the data from s3 for the secondary cluster to be operational.

In general, a solution where a standby cluster makes sure it syncs against s3 is possible to implement, and then do the switch (which will be fast), but, the effort will be similar to having the standby cluster being synch'ed by the primary cluster (and there will be several sync modes, master - slave (in different modes), and master master).

Its all a bit up in the air. Currently, I want to first tackle a multi site with high speed connection with special shard allocation logic.
On Friday, May 6, 2011 at 1:03 AM, Michel Conrad wrote:
Oh I meant s3 instead of ec2. To rephrase my question would it be

possible to use an es cluster with local storage, which updates the
replications of the data on the s3 storage, so that in the event that
something goes wrong with the local storage one can recover
transparently from the s3 storage, creating a new local storage.

On Thu, May 5, 2011 at 11:36 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Which replica are you referring to? The replica "cluster"? What is ec2
storage, ebs/s3?

On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul

Multi site with automatic shard allocation surely sounds interesting.

As for the s3 gateway, am I understanding it correctly that when using
it, the indices are updated on the gateway and mirrored locally for
fast access.
After a full cluster shutdown and restart, will the data still be
available locally or do I have to download the whole indices again?
What about shard relocation, will the data be streamed
from the primary shard over the local network or from the s3 gateway.

Thanks very much for your answers and patches, that come always really quick.

Michel

On Fri, May 6, 2011 at 12:13 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

The shared s3 gateway can be the main store data for the primary cluster.
And if it fails, you can start another cluster which will use that data. The
problem there is that it will take time to download all the data from s3 for
the secondary cluster to be operational.
In general, a solution where a standby cluster makes sure it syncs against
s3 is possible to implement, and then do the switch (which will be fast),
but, the effort will be similar to having the standby cluster being synch'ed
by the primary cluster (and there will be several sync modes, master - slave
(in different modes), and master master).
Its all a bit up in the air. Currently, I want to first tackle a multi site
with high speed connection with special shard allocation logic.

On Friday, May 6, 2011 at 1:03 AM, Michel Conrad wrote:

Oh I meant s3 instead of ec2. To rephrase my question would it be
possible to use an es cluster with local storage, which updates the
replications of the data on the s3 storage, so that in the event that
something goes wrong with the local storage one can recover
transparently from the s3 storage, creating a new local storage.

On Thu, May 5, 2011 at 11:36 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Which replica are you referring to? The replica "cluster"? What is ec2
storage, ebs/s3?

On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul

After a full cluster shutdown and startup, the primary shards will be allocated to nodes that have as much data similar to the data on s3, and only missing data is recovered from s3 (thats why its important to configure things like gateway.recover_after_nodes). Replicas are recovered from primary shards, not from s3, and also reuse local data if possible.
On Friday, May 6, 2011 at 11:47 AM, Michel Conrad wrote:
Multi site with automatic shard allocation surely sounds interesting.

As for the s3 gateway, am I understanding it correctly that when using
it, the indices are updated on the gateway and mirrored locally for
fast access.
After a full cluster shutdown and restart, will the data still be
available locally or do I have to download the whole indices again?
What about shard relocation, will the data be streamed
from the primary shard over the local network or from the s3 gateway.

Thanks very much for your answers and patches, that come always really quick.

Michel

On Fri, May 6, 2011 at 12:13 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

The shared s3 gateway can be the main store data for the primary cluster.
And if it fails, you can start another cluster which will use that data. The
problem there is that it will take time to download all the data from s3 for
the secondary cluster to be operational.
In general, a solution where a standby cluster makes sure it syncs against
s3 is possible to implement, and then do the switch (which will be fast),
but, the effort will be similar to having the standby cluster being synch'ed
by the primary cluster (and there will be several sync modes, master - slave
(in different modes), and master master).
Its all a bit up in the air. Currently, I want to first tackle a multi site
with high speed connection with special shard allocation logic.

On Friday, May 6, 2011 at 1:03 AM, Michel Conrad wrote:

Oh I meant s3 instead of ec2. To rephrase my question would it be
possible to use an es cluster with local storage, which updates the
replications of the data on the s3 storage, so that in the event that
something goes wrong with the local storage one can recover
transparently from the s3 storage, creating a new local storage.

On Thu, May 5, 2011 at 11:36 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Which replica are you referring to? The replica "cluster"? What is ec2
storage, ebs/s3?

On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul

Thanks for the reply, that now finally makes it clear to me how the s3
storage can be used, I wasn't entirely sure how the recovery would be
done.

On Fri, May 6, 2011 at 5:26 PM, Shay Banon shay.banon@elasticsearch.com wrote:

After a full cluster shutdown and startup, the primary shards will be
allocated to nodes that have as much data similar to the data on s3, and
only missing data is recovered from s3 (thats why its important to configure
things like gateway.recover_after_nodes). Replicas are recovered from
primary shards, not from s3, and also reuse local data if possible.

On Friday, May 6, 2011 at 11:47 AM, Michel Conrad wrote:

Multi site with automatic shard allocation surely sounds interesting.

As for the s3 gateway, am I understanding it correctly that when using
it, the indices are updated on the gateway and mirrored locally for
fast access.
After a full cluster shutdown and restart, will the data still be
available locally or do I have to download the whole indices again?
What about shard relocation, will the data be streamed
from the primary shard over the local network or from the s3 gateway.

Thanks very much for your answers and patches, that come always really
quick.

Michel

On Fri, May 6, 2011 at 12:13 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

The shared s3 gateway can be the main store data for the primary cluster.
And if it fails, you can start another cluster which will use that data. The
problem there is that it will take time to download all the data from s3 for
the secondary cluster to be operational.
In general, a solution where a standby cluster makes sure it syncs against
s3 is possible to implement, and then do the switch (which will be fast),
but, the effort will be similar to having the standby cluster being synch'ed
by the primary cluster (and there will be several sync modes, master - slave
(in different modes), and master master).
Its all a bit up in the air. Currently, I want to first tackle a multi site
with high speed connection with special shard allocation logic.

On Friday, May 6, 2011 at 1:03 AM, Michel Conrad wrote:

Oh I meant s3 instead of ec2. To rephrase my question would it be
possible to use an es cluster with local storage, which updates the
replications of the data on the s3 storage, so that in the event that
something goes wrong with the local storage one can recover
transparently from the s3 storage, creating a new local storage.

On Thu, May 5, 2011 at 11:36 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Which replica are you referring to? The replica "cluster"? What is ec2
storage, ebs/s3?

On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:

I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?

Michel

On Thu, May 5, 2011 at 1:00 PM, Paul Smith tallpsmith@gmail.com wrote:

Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?

Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub
thanks!
Paul