Impact of using S3 Gateway

James_Cook · August 21, 2011, 8:26pm

I've been writing the ES/EC2 tutorial that Clinton put me up to.

I would like to include some content about the downsides of using the S3
Gateway.

I suppose performance isn't an issue since ES will write asynchronously. I
know the performance of a recovery is slower because of the EC2/S3 IO
throughput.

As I see it, there must be some kind of potential consistency problem since
ES writes to the S3 gateway asynchronously. If the asynchronous write fails
because of an abrupt termination of a cluster node, isn't is possible that
the gateway will be in an inconsistent state? I suppose that some other node
will be the destination of the failed primary shard and could potentially
correct the problem. If the primary and replica nodes for a shard all fail,
couldn't the gateway get out of sync?

If yes, can the gateway be corrupted, or will it just recover to a prior
valid state?

kimchy · August 22, 2011, 6:39pm

Hi James,

I just saw the tutorial you put up, I will go over it today and update it
where relevant, but, lemme answer quickly here as well.

The first major downside of using s3 gateway is the overhead in network
for transferring the (delta) index files and persisting them on s3. The
second downside is the fact that its written asynchronously, so changes you
made that have not yet been persisted, will be lost on full cluster shutdown
(but, the last "fully" persisted state is always available).

-shay.banon

On Sun, Aug 21, 2011 at 11:26 PM, James Cook jcook@tracermedia.com wrote:

I've been writing the ES/EC2 tutorial that Clinton put me up to.

I would like to include some content about the downsides of using the S3
Gateway.

I suppose performance isn't an issue since ES will write asynchronously. I
know the performance of a recovery is slower because of the EC2/S3 IO
throughput.

As I see it, there must be some kind of potential consistency problem since
ES writes to the S3 gateway asynchronously. If the asynchronous write fails
because of an abrupt termination of a cluster node, isn't is possible that
the gateway will be in an inconsistent state? I suppose that some other node
will be the destination of the failed primary shard and could potentially
correct the problem. If the primary and replica nodes for a shard all fail,
couldn't the gateway get out of sync?

If yes, can the gateway be corrupted, or will it just recover to a prior
valid state?