Adding S3 gateway on a local-gateway machine

mrflip · December 23, 2010, 11:12pm

We have a machine set up currently with a local gateway (full config
at https://gist.github.com/f003c19dd0ce53c654cb )

gateway:
type: local
index:
gateway:
snapshot_interval: -1
snapshot_on_close: false

We're considering moving the cluster to use the S3 gateway. It's a 16-
machine cluster; when all is done it will hold about 11 indexes, 176
shards x 2 (replicas = 1), each of about 5-15GB actual-on-disk usage.

Can we switch the cluster over to use the s3 gateway without losing
files?
I know I'll have to trigger a snapshot using eg.
curl -XPOST 'http://localhost:9200/_gateway/snapshot'
My concern is that once I update the config, I'll have to restart each
data node; will it try to initiate recovery from the (empty) s3
gateway, or can I make it adopt the local files already presence and
then push them to S3 after going green?

Also, are there any non-obvious performance implications for pushing
that much data through s3? Will new nodes recover from their peers or
pull from s3?

thanks,
flip

kimchy · December 23, 2010, 11:26pm

Hi,

There is no way to switch from local to gateway without reindexing the
data.

Regarding the overhead of s3, there are basically two. The first is the
initial recovery on full cluster startup. If you set the
gateway.recovery_after_xxx settings, then shards will be allocated to nodes
that have the most common local data with regards to s3, so the recovery
times should be minimal.

The second problem with s3 is more concerning, which is the need to push
the data to s3. This will require network resources, which are very rare on
ec2 ;), and will compete with indexing / searching network operations... .

-shay.banon

On Fri, Dec 24, 2010 at 1:12 AM, mrflip mrflip@gmail.com wrote:

We have a machine set up currently with a local gateway (full config
at elasticsearch.yml · GitHub )

gateway:
type: local
index:
gateway:
snapshot_interval: -1
snapshot_on_close: false

We're considering moving the cluster to use the S3 gateway. It's a 16-
machine cluster; when all is done it will hold about 11 indexes, 176
shards x 2 (replicas = 1), each of about 5-15GB actual-on-disk usage.

Can we switch the cluster over to use the s3 gateway without losing
files?
I know I'll have to trigger a snapshot using eg.
curl -XPOST 'http://localhost:9200/_gateway/snapshot'
My concern is that once I update the config, I'll have to restart each
data node; will it try to initiate recovery from the (empty) s3
gateway, or can I make it adopt the local files already presence and
then push them to S3 after going green?

Also, are there any non-obvious performance implications for pushing
that much data through s3? Will new nodes recover from their peers or
pull from s3?

thanks,
flip

Topic		Replies	Views
How to add ec2 s3 or other gateway after index is created? Elasticsearch	8	424	July 6, 2017
Moving from fs gateway type to cluster using S3/Cloudfiles Elasticsearch	7	440	July 6, 2017
Deleting s3 gateway data Elasticsearch	6	308	July 6, 2017
Recovery from S3 gateway - only one shard recovers? Elasticsearch	10	493	July 6, 2017
Question about s3 gateway vs EBS Elasticsearch	8	431	July 6, 2017

Adding S3 gateway on a local-gateway machine

Related topics