How does a recovering node validate any shard information/data during recover?


(Paul Smith) #1

Had an interesting question I couldn't answer today. When an existing node
of a cluster is shutdown, or it is terminated, when it starts up again, it
will goes through a recovery phase. Just wanted to understand a little bit
more about what checks a node does to check whether it has valid state, and
how it rebuilds it's shard meta-data and content if it's not.

In a fairly static index that doesn't change, is there a fast path to
confirm that said recovering node has indeed got the latest info and there's
no need to re-replicate anything? Are existing replicas used as a source
for copying info, or does it always favour retrieving from the master shard
node (for a given shard)?

This is the meat and potatoes of what ES is doing for us, and on the one
hand, "we shouldn't need to know", but...

thanks,

Paul


(Shay Banon) #2

If a single node fails, then a few things happen. First, if on that node
there were some elected primary shards, then a replica shard on the rest of
the nodes will be promoted to primary. Also, the shards allocated on that
node will begin to be allocated on the rest of the nodes (throttled).

When the node comes back, then it will join the cluster, and, will start
to be assigned shards. If not all shards from that node have been assigned
yet to the rest of the cluster, then they will favor to be allocated on that
node, if they have similar data (checksumed index file).

In general, the above logic is more generic in elasticsearch. When a
replica shard is going through the allocation logic, it will favor first to
be allocated on a node that has the closest matching index files against the
relevant primary shard, so to reduce the amount of data to transfer from the
primary shard to the replica shard.

On Mon, Aug 1, 2011 at 8:34 AM, Paul Smith tallpsmith@gmail.com wrote:

Had an interesting question I couldn't answer today. When an existing node
of a cluster is shutdown, or it is terminated, when it starts up again, it
will goes through a recovery phase. Just wanted to understand a little bit
more about what checks a node does to check whether it has valid state, and
how it rebuilds it's shard meta-data and content if it's not.

In a fairly static index that doesn't change, is there a fast path to
confirm that said recovering node has indeed got the latest info and there's
no need to re-replicate anything? Are existing replicas used as a source
for copying info, or does it always favour retrieving from the master shard
node (for a given shard)?

This is the meat and potatoes of what ES is doing for us, and on the one
hand, "we shouldn't need to know", but...

thanks,

Paul


(Paul Smith) #3

When the node comes back, then it will join the cluster, and, will start
to be assigned shards. If not all shards from that node have been assigned
yet to the rest of the cluster, then they will favor to be allocated on that
node, if they have similar data (checksumed index file).

If a node goes down for an extended period, and presuming that the shard
info is now pretty much out of date, I'm presuming there's some point it
goes "ok stuff it, 'rm -rf' and get a fresh copy from the primary?


(Shay Banon) #4

On Mon, Aug 1, 2011 at 10:02 AM, Paul Smith tallpsmith@gmail.com wrote:

When the node comes back, then it will join the cluster, and, will start
to be assigned shards. If not all shards from that node have been assigned
yet to the rest of the cluster, then they will favor to be allocated on that
node, if they have similar data (checksumed index file).

If a node goes down for an extended period, and presuming that the shard
info is now pretty much out of date, I'm presuming there's some point it
goes "ok stuff it, 'rm -rf' and get a fresh copy from the primary?

Yes, effectively this will happen as the data in the "live" nodes will start
to diverge from what was stored in the shards that exist on the node that
went down.


(system) #5