Gateway.recover_after_nodes question

I'm trying to understand how to use gateway.recover_after_nodes
setting. I have a 2 node cluster with the following config:

cluster:
name: Awesome

gateway:
type: s3
s3:
bucket: ****
recover_after_nodes: 2

index:
gateway:
snapshot_interval: 60s

cloud:
aws:
access_key: ****
secret_key: ****

discovery:
type: ec2

I created an index with 2 shards and 1 replica and indexed a few
documents into it. Then I shut down the cluster and brought up both
nodes, expecting them to recover from local index storage, but the log
says:

[16:34:39,657][INFO ][cluster.service ] [Black King] added
{[Goldeneye][e8881b4f-7458-4a66-b590-c612da5cc54e][inet[/
10.204.250.116:9300]],}, reason: zen-disco-receive(from
node[[Goldeneye][e8881b4f-7458-4a66-b590-c612da5cc54e][inet[/
10.204.250.116:9300]]])
[16:34:40,988][INFO ][cluster.metadata ] [Black King] [foo]
creating index, cause [gateway], shards [2]/[1], mappings [bar]
[16:34:42,879][INFO ][cluster.metadata ] [Black King] [foo]
created and added to cluster_state

My understanding was that recover_after_nodes = 2 gives the cluster a
chance to recover index from local storage, but apparently not?

-Andrei

recover_after_nodes only means that the actual state recovery (which indices
were created) as well as shard allocation (when using shared gateway) will
happen only after the specific number of nodes has been started in a
cluster.

This makes sense in shared storage gateway mode so locally stored shards can
be reused (shards gets allocated to nodes that have the smallest diff
between their local storage and the shared gateway).

-shay.banon

On Wed, Sep 15, 2010 at 6:38 PM, Andrei andrei@zmievski.org wrote:

I'm trying to understand how to use gateway.recover_after_nodes
setting. I have a 2 node cluster with the following config:

cluster:
name: Awesome

gateway:
type: s3
s3:
bucket: ****
recover_after_nodes: 2

index:
gateway:
snapshot_interval: 60s

cloud:
aws:
access_key: ****
secret_key: ****

discovery:
type: ec2

I created an index with 2 shards and 1 replica and indexed a few
documents into it. Then I shut down the cluster and brought up both
nodes, expecting them to recover from local index storage, but the log
says:

[16:34:39,657][INFO ][cluster.service ] [Black King] added
{[Goldeneye][e8881b4f-7458-4a66-b590-c612da5cc54e][inet[/
10.204.250.116:9300]],}, reason: zen-disco-receive(from
node[[Goldeneye][e8881b4f-7458-4a66-b590-c612da5cc54e][inet[/
10.204.250.116:9300]]])
[16:34:40,988][INFO ][cluster.metadata ] [Black King] [foo]
creating index, cause [gateway], shards [2]/[1], mappings [bar]
[16:34:42,879][INFO ][cluster.metadata ] [Black King] [foo]
created and added to cluster_state

My understanding was that recover_after_nodes = 2 gives the cluster a
chance to recover index from local storage, but apparently not?

-Andrei

Sorry, I am still trying to wrap my head around this.

In my case, the s3 gateway is a shared one (correct?), so I was trying
to get the nodes to recover the state/shards from the local shards,
and only then fetch the shards from S3 if the local recovery was not
possible. Did I miss something?

What is the default value of recovery_after_nodes? In my 2 node
cluster, what is the advantage of setting recovery_after_nodes to 2
instead of 1? Are there guidelines for what it should be, depending on
the number of nodes and shards?

Overall I'm really impressed by the dynamic node discovery and
failover capabilities of ES. Thanks for the great work.

-Andrei

On Sep 15, 10:47 am, Shay Banon shay.ba...@elasticsearch.com wrote:

recover_after_nodes only means that the actual state recovery (which indices
were created) as well as shard allocation (when using shared gateway) will
happen only after the specific number of nodes has been started in a
cluster.

This makes sense in shared storage gateway mode so locally stored shards can
be reused (shards gets allocated to nodes that have the smallest diff
between their local storage and the shared gateway).

-shay.banon

Yea, s3 gateway is a shared one. The recovery process after a full shutdown
always goes to the gateway. The way that the "reuse local storage" mode
works is that, after "recover_after_nodes" have been discovered, when trying
to allocate a specific shard to a node, a diff is one between what data the
gateway has between all the nodes that hold that shard data, and then the
shard is allocated to the node with the "closest" amount of data. This means
that less data will need to be fetched from the gateway in order to complete
the recovery process as the local data on the node can be reused.

In general, the closer the recover_after_nodes is to the actual amount of
nodes you are going to start, the better chances that the recovery process
will complete quickly, as there will be more nodes to check for possible
shard allocate and reuse of the local storage.

The local gateway feature in master works completely different. Recovery
always happen only from the local storage, there is no remote storage in
question. It is much faster and more resilient to corruptions, at the cost
of relying on local storage and replication to provide durability (and not a
shared storage).

Once 0.11 is out I plan to write a whole new section in the docs, which is
more of "vertical" documentation and not module based documentation as the
format of the docs is today. In this section, I plan to cover things like
Gateway (long term persistency), Analysis Paralysis, and other "vertical"
aspects.

-shay.banon

On Thu, Sep 16, 2010 at 12:09 AM, Andrei andrei@zmievski.org wrote:

Sorry, I am still trying to wrap my head around this.

In my case, the s3 gateway is a shared one (correct?), so I was trying
to get the nodes to recover the state/shards from the local shards,
and only then fetch the shards from S3 if the local recovery was not
possible. Did I miss something?

What is the default value of recovery_after_nodes? In my 2 node
cluster, what is the advantage of setting recovery_after_nodes to 2
instead of 1? Are there guidelines for what it should be, depending on
the number of nodes and shards?

Overall I'm really impressed by the dynamic node discovery and
failover capabilities of ES. Thanks for the great work.

-Andrei

On Sep 15, 10:47 am, Shay Banon shay.ba...@elasticsearch.com wrote:

recover_after_nodes only means that the actual state recovery (which
indices
were created) as well as shard allocation (when using shared gateway)
will
happen only after the specific number of nodes has been started in a
cluster.

This makes sense in shared storage gateway mode so locally stored shards
can
be reused (shards gets allocated to nodes that have the smallest diff
between their local storage and the shared gateway).

-shay.banon