Question about s3 gateway vs EBS

I'm about to set up a new cluster on ec2 and the idea of using the s3
gateway looks very appealing - if I can avoid dealing with EBS variable
latencies, striping etc. it would be great. However, I got the sense from
reading posts here and elsewhere that s3 is not the preferable solution if
I am to expect reasonably high load on my cluster - is that the general
feeling? If so I'm comfortable enough with EBS that I'd go with it instead.

S3 will not help you alleviate any latency issues since its
performance is no better than EBS.

On Fri, Jan 6, 2012 at 4:08 PM, Gautam golwala@gmail.com wrote:

I'm about to set up a new cluster on ec2 and the idea of using the s3
gateway looks very appealing - if I can avoid dealing with EBS variable
latencies, striping etc. it would be great. However, I got the sense from
reading posts here and elsewhere that s3 is not the preferable solution if I
am to expect reasonably high load on my cluster - is that the general
feeling? If so I'm comfortable enough with EBS that I'd go with it instead.

Yea, s3 basically just snapshots the current state of the local stored
indexes, you still need to have the index be stored locally. It can
be ephemeral drives or EBS.

On Sat, Jan 7, 2012 at 3:55 AM, Ivan Brusic ivan@brusic.com wrote:

S3 will not help you alleviate any latency issues since its
performance is no better than EBS.

On Fri, Jan 6, 2012 at 4:08 PM, Gautam golwala@gmail.com wrote:

I'm about to set up a new cluster on ec2 and the idea of using the s3
gateway looks very appealing - if I can avoid dealing with EBS variable
latencies, striping etc. it would be great. However, I got the sense from
reading posts here and elsewhere that s3 is not the preferable solution
if I
am to expect reasonably high load on my cluster - is that the general
feeling? If so I'm comfortable enough with EBS that I'd go with it
instead.

Shay, is there any "definitive guide" explaining the differences
between S3 and EBS based persistence?

The way I see it, it's about:

a) S3 is easy to setup and use, it works "out of the box"
b) recreating the state of a large cluster will be painful from S3,
since the data must be physically copied to machines; with EBS, the
data is "already there"

Correct?

Thanks!,

Karel

On Jan 7, 8:37 pm, Shay Banon kim...@gmail.com wrote:

Yea, s3 basically just snapshots the current state of the local stored
indexes, you still need to have the index be stored locally. It can
be ephemeral drives or EBS.

On Sat, Jan 7, 2012 at 3:55 AM, Ivan Brusic i...@brusic.com wrote:

S3 will not help you alleviate any latency issues since its
performance is no better than EBS.

On Fri, Jan 6, 2012 at 4:08 PM, Gautam golw...@gmail.com wrote:

I'm about to set up a new cluster on ec2 and the idea of using the s3
gateway looks very appealing - if I can avoid dealing with EBS variable
latencies, striping etc. it would be great. However, I got the sense from
reading posts here and elsewhere that s3 is not the preferable solution
if I
am to expect reasonably high load on my cluster - is that the general
feeling? If so I'm comfortable enough with EBS that I'd go with it
instead.

Yea, b) is the important part. Also, while the system is running, there is
a need to keep updating s3 with the current state of the index (in a shared
gateway mode).

In the future, I hope to combine the two. Allow to use local gateway (EBS
or not), and allow to snapshot (whenever you want) the state to s3, and
recover from it if needed to.

On Sun, Jan 8, 2012 at 8:52 AM, Karel Minařík karel.minarik@gmail.comwrote:

Shay, is there any "definitive guide" explaining the differences
between S3 and EBS based persistence?

The way I see it, it's about:

a) S3 is easy to setup and use, it works "out of the box"
b) recreating the state of a large cluster will be painful from S3,
since the data must be physically copied to machines; with EBS, the
data is "already there"

Correct?

Thanks!,

Karel

On Jan 7, 8:37 pm, Shay Banon kim...@gmail.com wrote:

Yea, s3 basically just snapshots the current state of the local stored
indexes, you still need to have the index be stored locally. It can
be ephemeral drives or EBS.

On Sat, Jan 7, 2012 at 3:55 AM, Ivan Brusic i...@brusic.com wrote:

S3 will not help you alleviate any latency issues since its
performance is no better than EBS.

On Fri, Jan 6, 2012 at 4:08 PM, Gautam golw...@gmail.com wrote:

I'm about to set up a new cluster on ec2 and the idea of using the s3
gateway looks very appealing - if I can avoid dealing with EBS
variable
latencies, striping etc. it would be great. However, I got the sense
from
reading posts here and elsewhere that s3 is not the preferable
solution
if I
am to expect reasonably high load on my cluster - is that the general
feeling? If so I'm comfortable enough with EBS that I'd go with it
instead.

Or memory

You mean indices that are memory based? Yea, in this case, teh s3 gateway
option is pretty cool since you can store the index completely in memory,
but still be able to recover if nodes fail (even if all replicas are gone).

On Mon, Jan 9, 2012 at 6:36 AM, James Cook jcook@pykl.com wrote:

Or memory

Yeah, the memory option would be a pretty powerful option for some
well specified cases!

At the moment, we're using S3 on AWS; yesterday I did a quick test and
I was very satisfied with the recovery speed. The setup:

  • One index, ~600,000 docs
  • 6.5GB

The full recovery cycle -- launch two m1.large nodes (<1min),
bootstrap and provision them with Chef (<3min) to "green" -- state
took something like 10 minutes. Effectively ~1GB per minute. That's
enough "realtime" in my book :slight_smile:

Karel

On Jan 9, 9:02 pm, Shay Banon kim...@gmail.com wrote:

You mean indices that are memory based? Yea, in this case, teh s3 gateway
option is pretty cool since you can store the index completely in memory,
but still be able to recover if nodes fail (even if all replicas are gone).

On Mon, Jan 9, 2012 at 6:36 AM, James Cook jc...@pykl.com wrote:

Or memory