Shards hot relocation


(Bing Hua) #1

A general question: Say we have a cluster running and constantly getting
index requests coming in. When a new node is brought up, some shards are
re-allocated to this node. What is happening to the source nodes of the
shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are changing?


(Shay Banon) #2

Yes, they are still processing the indexing requests until the relocation
is done. Its done in several stages (the relocation, or recovery for that
matter).

On Tue, May 15, 2012 at 12:12 AM, Bing Hua bh349@cornell.edu wrote:

A general question: Say we have a cluster running and constantly getting
index requests coming in. When a new node is brought up, some shards are
re-allocated to this node. What is happening to the source nodes of the
shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are changing?


(Bing) #3

Thanks kimchy. It's good to know they still process indexing but still
I'm curious on how do they 'process' indexing to a 'moving' shard. Is
it achieved by using transaction log? So the source node marks the
happening of shard relocation, does the index transfer while recording
incoming requests but not merging them into the index. After the
transfer is done, it sends the part of trans log to the new node to
have it process indexing?

Bing

On May 15, 3:54 pm, Shay Banon kim...@gmail.com wrote:

Yes, they are still processing the indexing requests until the relocation
is done. Its done in several stages (the relocation, or recovery for that
matter).

On Tue, May 15, 2012 at 12:12 AM, Bing Hua bh...@cornell.edu wrote:

A general question: Say we have a cluster running and constantly getting
index requests coming in. When a new node is brought up, some shards are
re-allocated to this node. What is happening to the source nodes of the
shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are changing?


(Lukáš Vlček) #4

Hi,

I think this question was also answered during Shay's presentation on last
#BBUZZ.

Here is the video:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html
Here are the slides:
http://2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/files/elasticsearch-bbuzz2011.pdf

More specifically, in the video, skip to [37:25] (although it can be hard
to make vimeo skip to it), it goes like:

Q: I am just wondering how to hot failover works, do you use push for
that? Do you replay the transaction log or...
A: yea, so you mean the hot relocation?
Q: yes
A: When the relocation happens, it gets smart in Lucene itself by making
sure that we do not delete the segment files of Lucene, and we start to
transfer these segment files, and we disable flushing, we do not call
commit anymore on Lucene and we only store the changes into transaction
log. Once that phase is done (ie. we copied over all the index files), we
start replaying the transaction log into to replica and then we do the
switch and tell it, ok, there is the new one.

*
*
HTH
*
*
Regards,
Lukas

On Thu, May 17, 2012 at 9:53 PM, Bing jasoninelmstreet@gmail.com wrote:

Thanks kimchy. It's good to know they still process indexing but still
I'm curious on how do they 'process' indexing to a 'moving' shard. Is
it achieved by using transaction log? So the source node marks the
happening of shard relocation, does the index transfer while recording
incoming requests but not merging them into the index. After the
transfer is done, it sends the part of trans log to the new node to
have it process indexing?

Bing

On May 15, 3:54 pm, Shay Banon kim...@gmail.com wrote:

Yes, they are still processing the indexing requests until the relocation
is done. Its done in several stages (the relocation, or recovery for that
matter).

On Tue, May 15, 2012 at 12:12 AM, Bing Hua bh...@cornell.edu wrote:

A general question: Say we have a cluster running and constantly
getting

index requests coming in. When a new node is brought up, some shards
are

re-allocated to this node. What is happening to the source nodes of the
shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are changing?


(Bing) #5

Thanks Lukas, so looks like my point was pretty much the case how it
is implemented, right?
The source node is accepting new index requests but not actually
'processing' them. After a while the list of pending requests are sent
to new node and processed by the new node.

Bing

On May 17, 10:12 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think this question was also answered during Shay's presentation on last
#BBUZZ.

Here is the video:http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
Here are the slides:http://2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/files/el...

More specifically, in the video, skip to [37:25] (although it can be hard
to make vimeo skip to it), it goes like:

Q: I am just wondering how to hot failover works, do you use push for
that? Do you replay the transaction log or...
A: yea, so you mean the hot relocation?
Q: yes
A: When the relocation happens, it gets smart in Lucene itself by making
sure that we do not delete the segment files of Lucene, and we start to
transfer these segment files, and we disable flushing, we do not call
commit anymore on Lucene and we only store the changes into transaction
log. Once that phase is done (ie. we copied over all the index files), we
start replaying the transaction log into to replica and then we do the
switch and tell it, ok, there is the new one.

*
*
HTH
*
*
Regards,
Lukas

On Thu, May 17, 2012 at 9:53 PM, Bing jasoninelmstr...@gmail.com wrote:

Thanks kimchy. It's good to know they still process indexing but still
I'm curious on how do they 'process' indexing to a 'moving' shard. Is
it achieved by using transaction log? So the source node marks the
happening of shard relocation, does the index transfer while recording
incoming requests but not merging them into the index. After the
transfer is done, it sends the part of trans log to the new node to
have it process indexing?

Bing

On May 15, 3:54 pm, Shay Banon kim...@gmail.com wrote:

Yes, they are still processing the indexing requests until the relocation
is done. Its done in several stages (the relocation, or recovery for that
matter).

On Tue, May 15, 2012 at 12:12 AM, Bing Hua bh...@cornell.edu wrote:

A general question: Say we have a cluster running and constantly
getting

index requests coming in. When a new node is brought up, some shards
are

re-allocated to this node. What is happening to the source nodes of the
shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are changing?


(Bing) #6

"we only store the changes into transaction log."
So it's not requests in the trans log but actual changes? Is there an
example of how this piece of trans log look like?

Bing

On May 18, 9:41 am, Bing jasoninelmstr...@gmail.com wrote:

Thanks Lukas, so looks like my point was pretty much the case how it
is implemented, right?
The source node is accepting new index requests but not actually
'processing' them. After a while the list of pending requests are sent
to new node and processed by the new node.

Bing

On May 17, 10:12 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think this question was also answered during Shay's presentation on last
#BBUZZ.

Here is the video:http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
Here are the slides:http://2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/files/el...

More specifically, in the video, skip to [37:25] (although it can be hard
to make vimeo skip to it), it goes like:

Q: I am just wondering how to hot failover works, do you use push for
that? Do you replay the transaction log or...
A: yea, so you mean the hot relocation?
Q: yes
A: When the relocation happens, it gets smart in Lucene itself by making
sure that we do not delete the segment files of Lucene, and we start to
transfer these segment files, and we disable flushing, we do not call
commit anymore on Lucene and we only store the changes into transaction
log. Once that phase is done (ie. we copied over all the index files), we
start replaying the transaction log into to replica and then we do the
switch and tell it, ok, there is the new one.

*
*
HTH
*
*
Regards,
Lukas

On Thu, May 17, 2012 at 9:53 PM, Bing jasoninelmstr...@gmail.com wrote:

Thanks kimchy. It's good to know they still process indexing but still
I'm curious on how do they 'process' indexing to a 'moving' shard. Is
it achieved by using transaction log? So the source node marks the
happening of shard relocation, does the index transfer while recording
incoming requests but not merging them into the index. After the
transfer is done, it sends the part of trans log to the new node to
have it process indexing?

Bing

On May 15, 3:54 pm, Shay Banon kim...@gmail.com wrote:

Yes, they are still processing the indexing requests until the relocation
is done. Its done in several stages (the relocation, or recovery for that
matter).

On Tue, May 15, 2012 at 12:12 AM, Bing Hua bh...@cornell.edu wrote:

A general question: Say we have a cluster running and constantly
getting

index requests coming in. When a new node is brought up, some shards
are

re-allocated to this node. What is happening to the source nodes of the
shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are changing?


(Shay Banon) #7

The data stored in the transaction log is specific ot each operation, for
example, when indexing, the source document is stored there, as well as
additional metadata.

On Fri, May 18, 2012 at 4:55 PM, Bing jasoninelmstreet@gmail.com wrote:

"we only store the changes into transaction log."
So it's not requests in the trans log but actual changes? Is there an
example of how this piece of trans log look like?

Bing

On May 18, 9:41 am, Bing jasoninelmstr...@gmail.com wrote:

Thanks Lukas, so looks like my point was pretty much the case how it
is implemented, right?
The source node is accepting new index requests but not actually
'processing' them. After a while the list of pending requests are sent
to new node and processed by the new node.

Bing

On May 17, 10:12 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think this question was also answered during Shay's presentation on
last

#BBUZZ.

Here is the video:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...

Here are the slides:
http://2011.berlinbuzzwords.de/sites/2011.berlinbuzzwords.de/files/el...

More specifically, in the video, skip to [37:25] (although it can be
hard

to make vimeo skip to it), it goes like:

*Q: I am just wondering how to hot failover works, do you use push for
that? Do you replay the transaction log or...
A: yea, so you mean the hot relocation?
Q: yes
A: When the relocation happens, it gets smart in Lucene itself by
making

sure that we do not delete the segment files of Lucene, and we start to
transfer these segment files, and we disable flushing, we do not call
commit anymore on Lucene and we only store the changes into transaction
log. Once that phase is done (ie. we copied over all the index files),
we

start replaying the transaction log into to replica and then we do the
switch and tell it, ok, there is the new one.*
*
*
HTH
*
*
Regards,
Lukas

On Thu, May 17, 2012 at 9:53 PM, Bing jasoninelmstr...@gmail.com
wrote:

Thanks kimchy. It's good to know they still process indexing but
still

I'm curious on how do they 'process' indexing to a 'moving' shard. Is
it achieved by using transaction log? So the source node marks the
happening of shard relocation, does the index transfer while
recording

incoming requests but not merging them into the index. After the
transfer is done, it sends the part of trans log to the new node to
have it process indexing?

Bing

On May 15, 3:54 pm, Shay Banon kim...@gmail.com wrote:

Yes, they are still processing the indexing requests until the
relocation

is done. Its done in several stages (the relocation, or recovery
for that

matter).

On Tue, May 15, 2012 at 12:12 AM, Bing Hua bh...@cornell.edu
wrote:

A general question: Say we have a cluster running and constantly
getting

index requests coming in. When a new node is brought up, some
shards

are

re-allocated to this node. What is happening to the source nodes
of the

shards? Are they still processing new index requests during shard
relocation? How do they transfer indices while indices are
changing?


(system) #8