After a full cluster shutdown and startup, the primary shards will be
allocated to nodes that have as much data similar to the data on s3, and
only missing data is recovered from s3 (thats why its important to configure
things like gateway.recover_after_nodes). Replicas are recovered from
primary shards, not from s3, and also reuse local data if possible.
On Friday, May 6, 2011 at 11:47 AM, Michel Conrad wrote:
Multi site with automatic shard allocation surely sounds interesting.
As for the s3 gateway, am I understanding it correctly that when using
it, the indices are updated on the gateway and mirrored locally for
After a full cluster shutdown and restart, will the data still be
available locally or do I have to download the whole indices again?
What about shard relocation, will the data be streamed
from the primary shard over the local network or from the s3 gateway.
Thanks very much for your answers and patches, that come always really
On Fri, May 6, 2011 at 12:13 AM, Shay Banon
The shared s3 gateway can be the main store data for the primary cluster.
And if it fails, you can start another cluster which will use that data. The
problem there is that it will take time to download all the data from s3 for
the secondary cluster to be operational.
In general, a solution where a standby cluster makes sure it syncs against
s3 is possible to implement, and then do the switch (which will be fast),
but, the effort will be similar to having the standby cluster being synch'ed
by the primary cluster (and there will be several sync modes, master - slave
(in different modes), and master master).
Its all a bit up in the air. Currently, I want to first tackle a multi site
with high speed connection with special shard allocation logic.
On Friday, May 6, 2011 at 1:03 AM, Michel Conrad wrote:
Oh I meant s3 instead of ec2. To rephrase my question would it be
possible to use an es cluster with local storage, which updates the
replications of the data on the s3 storage, so that in the event that
something goes wrong with the local storage one can recover
transparently from the s3 storage, creating a new local storage.
On Thu, May 5, 2011 at 11:36 PM, Shay Banon
Which replica are you referring to? The replica "cluster"? What is ec2
On Thursday, May 5, 2011 at 7:36 PM, Michel Conrad wrote:
I would also love to see multi-DC replication.
So would it also be possible to have a cluster using the local storage
for speed, and the replica using ec2 storage for persistency?
On Thu, May 5, 2011 at 1:00 PM, Paul Smith email@example.com wrote:
Yes. It can be simplified, we can add a disable flush flag that can be set
dynamically. Open an issue?