Deleting s3 gateway data

Steve_2 · May 8, 2012, 6:59pm

Hi,

Apologies if this has been answered before; I've have a look at the
docs and archives but may have missed something. We've got a single
index on 0.18.6 running on an EC2 cluster that we want to back with
the s3 gateway. We generate a few GB of documents a day at a regular
pace (no huge spikes), and set a TTL of 15 days on all documents. As
such, we're hoping that the size of the data stored in s3 will remain
reasonably constant over time. Two questions:

1 - is that much data too much to hope to push to s3 all the time?

2 - will elasticsearch remove old data from s3 as it becomes unneeded?

Thanks!

Steve

kimchy · May 10, 2012, 6:02am

First, my general recommendation is to use local gateway on ec2, its
considerably more lightweight compared to the s3 gateway. You can still
back it up ofcourse.

On Tue, May 8, 2012 at 9:59 PM, Steve steve1.mclellan@googlemail.comwrote:

Hi,

Apologies if this has been answered before; I've have a look at the
docs and archives but may have missed something. We've got a single
index on 0.18.6 running on an EC2 cluster that we want to back with
the s3 gateway. We generate a few GB of documents a day at a regular
pace (no huge spikes), and set a TTL of 15 days on all documents. As
such, we're hoping that the size of the data stored in s3 will remain
reasonably constant over time. Two questions:

1 - is that much data too much to hope to push to s3 all the time?

It should be fine.

2 - will elasticsearch remove old data from s3 as it becomes unneeded?

Not as data gets deleted. Deletes are marked to be deleted, and later on,
as the index performs merges, they will be merged out.

Thanks!

Steve

Steve_2 · May 11, 2012, 3:21pm

Hi Shay,

Thanks for the response. Our motivation to use the s3 gateway is that
for some of our environments we have a lot of documents on relatively
small clusters (because load is low) and were finding ourselves firing
up more ec2 instances just to get more local disk space, which isn't
very cost effective. We're trialling the s3 gateway for a couple of
weeks to see how it goes; we may investigate other options if we have
problems.

Steve

On May 10, 1:02 am, Shay Banon kim...@gmail.com wrote:

First, my general recommendation is to use local gateway on ec2, its
considerably more lightweight compared to the s3 gateway. You can still
back it up ofcourse.

On Tue, May 8, 2012 at 9:59 PM, Steve steve1.mclel...@googlemail.comwrote:

Hi,

Apologies if this has been answered before; I've have a look at the
docs and archives but may have missed something. We've got a single
index on 0.18.6 running on an EC2 cluster that we want to back with
the s3 gateway. We generate a few GB of documents a day at a regular
pace (no huge spikes), and set a TTL of 15 days on all documents. As
such, we're hoping that the size of the data stored in s3 will remain
reasonably constant over time. Two questions:

1 - is that much data too much to hope to push to s3 all the time?

It should be fine.

2 - will elasticsearch remove old data from s3 as it becomes unneeded?

Not as data gets deleted. Deletes are marked to be deleted, and later on,
as the index performs merges, they will be merged out.

Thanks!

Steve

kimchy · May 15, 2012, 7:54pm

With the s3 gateway, you still have the indexes stored on the nodes data
location, they are just snapshotted periodically to s3.

On Fri, May 11, 2012 at 6:21 PM, Steve steve1.mclellan@googlemail.comwrote:

Hi Shay,

Thanks for the response. Our motivation to use the s3 gateway is that
for some of our environments we have a lot of documents on relatively
small clusters (because load is low) and were finding ourselves firing
up more ec2 instances just to get more local disk space, which isn't
very cost effective. We're trialling the s3 gateway for a couple of
weeks to see how it goes; we may investigate other options if we have
problems.

Steve

On May 10, 1:02 am, Shay Banon kim...@gmail.com wrote:

First, my general recommendation is to use local gateway on ec2, its
considerably more lightweight compared to the s3 gateway. You can still
back it up ofcourse.

On Tue, May 8, 2012 at 9:59 PM, Steve <steve1.mclel...@googlemail.com
wrote:

Hi,

Apologies if this has been answered before; I've have a look at the
docs and archives but may have missed something. We've got a single
index on 0.18.6 running on an EC2 cluster that we want to back with
the s3 gateway. We generate a few GB of documents a day at a regular
pace (no huge spikes), and set a TTL of 15 days on all documents. As
such, we're hoping that the size of the data stored in s3 will remain
reasonably constant over time. Two questions:

1 - is that much data too much to hope to push to s3 all the time?

It should be fine.

2 - will elasticsearch remove old data from s3 as it becomes unneeded?

Not as data gets deleted. Deletes are marked to be deleted, and later on,
as the index performs merges, they will be merged out.

Thanks!

Steve

Eric_Jain · May 15, 2012, 8:57pm

On Tue, May 15, 2012 at 12:54 PM, Shay Banon kimchy@gmail.com wrote:

With the s3 gateway, you still have the indexes stored on the nodes data
location, they are just snapshotted periodically to s3.

Is this snapshotting less efficient than other backup options?

kimchy · May 16, 2012, 9:14pm

Yes, because the recovery on full cluster restart relies on hte latest
snapshotted data, while if it was based on the local gateway by default,
and only on backups on worst case scenario, then its a different story.

On Tue, May 15, 2012 at 11:57 PM, Eric Jain eric.jain@gmail.com wrote:

On Tue, May 15, 2012 at 12:54 PM, Shay Banon kimchy@gmail.com wrote:

With the s3 gateway, you still have the indexes stored on the nodes data
location, they are just snapshotted periodically to s3.

Is this snapshotting less efficient than other backup options?

Topic		Replies	Views
How to add ec2 s3 or other gateway after index is created? Elasticsearch	8	424	July 6, 2017
Adding S3 gateway on a local-gateway machine Elasticsearch	2	321	July 6, 2017
Suggestions on indexing LARGE, GROWING data sets Elasticsearch	4	565	July 6, 2017
Is the S3 Gateway Technically Insufficient Elasticsearch	3	320	July 6, 2017
Question about s3 gateway vs EBS Elasticsearch	8	431	July 6, 2017

Deleting s3 gateway data

Related topics