Need Clarification on Shards Replication

ananth · January 2, 2014, 5:43am

I have one es master and data-node and indexing documents to that (1 shard

1 Replica), after indexing few documents (say 1 million and still
indexing docs), adding one more data node to the cluster , now the shards
started replicating to new node. How this replication happens ? In the
mean i am still indexing new documents to that index.
1. Whether datanode1 will send index segments to datanode2 ?
2. Whether datanode1 will send documents one by one (as IndexRequests)
  to datanode2 instead of copying segments ?
3. Whether datanode1 will send whole index to datanode2 ?

How will indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb these settings react with
respect to the above test scenario ?

Anantha Govindarajan.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/326cfecc-b59c-4e4c-b5e9-e369e841a02e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · January 2, 2014, 8:32am

Hey,

replication is done per document (as opposed to relocation). So the
document is indexed on the primary first, and if it was successful there,
the document is indexed on all replicas of a shard in parallel. If that
index operation on the replica(s) has returned, the index requests is
returned to the client.

The throttling of merges (which is a heavy I/O and CPU intensive background
process) ensures, you have enough I/O performance available for index and
search operations.

Hope this helps...

--Alex

On Thu, Jan 2, 2014 at 6:43 AM, Anantha Govindarajan <
ananthagovindarajan@gmail.com> wrote:

I have one es master and data-node and indexing documents to that (1 shard

1 Replica), after indexing few documents (say 1 million and still
indexing docs), adding one more data node to the cluster , now the shards
started replicating to new node. How this replication happens ? In the
mean i am still indexing new documents to that index.

Whether datanode1 will send index segments to datanode2 ?

Whether datanode1 will send documents one by one (as IndexRequests)
to datanode2 instead of copying segments ?

Whether datanode1 will send whole index to datanode2 ?

How will indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb these settings react
with respect to the above test scenario ?

Anantha Govindarajan.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/326cfecc-b59c-4e4c-b5e9-e369e841a02e%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_EZ_bb1hsVpLyW7Pt0UWM47GRU2iuQt_mJPV0xoO5iSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

ananth · January 2, 2014, 9:39am

Hi Alex

Thanks for replying. If i understand correctly normal indexing flow is,

Document is indexed in primary shard machine , then replica shard
machine then return the index response to client - in case of
ReplicationType.SYNC.
Document is indexed in primary shard machine , then sent it to replica
machine(s) if available, and wont wait for response - in case of
ReplicationType.ASYNC.

But my question is not normal indexing flow. I have already indexed 1
million documents in primary shard alone , at that moment no node is
available for replica.

after some time adding a machine to cluster , at this point new indexing
documents follows normal indexing flow (am i right ? Not sure !). But my
question is how existing 1 million documents in primary shard is replicated
to new machine ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd473bdb-8603-44b2-a59c-0a8f3033ad0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · January 2, 2014, 10:32am

It's relocation. Segments are copied over the wire. New updates/insert/delete operations which happen in the meantime are replayed from the transaction log on the new shard.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 2 janv. 2014 à 10:39, Anantha Govindarajan ananthagovindarajan@gmail.com a écrit :

Hi Alex

Thanks for replying. If i understand correctly normal indexing flow is,

Document is indexed in primary shard machine , then replica shard machine then return the index response to client - in case of ReplicationType.SYNC.
Document is indexed in primary shard machine , then sent it to replica machine(s) if available, and wont wait for response - in case of ReplicationType.ASYNC.
But my question is not normal indexing flow. I have already indexed 1 million documents in primary shard alone , at that moment no node is available for replica.

after some time adding a machine to cluster , at this point new indexing documents follows normal indexing flow (am i right ? Not sure !). But my question is how existing 1 million documents in primary shard is replicated to new machine ?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd473bdb-8603-44b2-a59c-0a8f3033ad0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2DA64523-60B8-4A18-86C7-4A737FADD6B1%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

ananth · January 2, 2014, 11:02am

Hi David ,

Thanks for your reply .

Until existing(not newly created) segments are fully copied to the new
machine , no indexing operation will happen on replica shard right ? rather
it notes down those new indexing documents in transaction log alone ?((Correct
me if i am wrong))

Once all segments are copied it replays the transaction logs . if so no new
documents visible for search , till segments copying process over. is it
right ?

*indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb *these properties related
to only lucene segment merges alone am i right ?

Ananth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/051a21a8-ba9f-4401-84ce-fce31a28b5fc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · January 3, 2014, 10:52am

Until existing(not newly created) segments are fully copied to the new machine , no indexing operation will happen on replica shard right ? rather it notes down those new indexing documents in transaction log alone ?((Correct me if i am wrong))
Once all segments are copied it replays the transaction logs . if so no new documents visible for search , till segments copying process over. is it right ?

Correct. Replica shard won't be in STARTED state so it won't be searchable.

indices.store.throttle.type: merge & indices.store.throttle.max_bytes_per_sec: 50mb these properties related to only lucene segment merges alone am i right ?
See whole definition: https://github.com/elasticsearch/elasticsearch/issues/2041

HTH

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52c69681.643c9869.11bb1%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

ananth · January 3, 2014, 12:58pm

Hi David

Thanks for your answers. I have few questions too ,

What will happen if datanode1 restarted while sending segments to
datanode2

On datanode1 start-up, which will be become primary shard ? Shard in
datanode1 or shard in datanode2 ?
If datanode1 becomes primary then how would it knows , how many amount
of segments are transferred to datanode2 before restart happen?
if datanode2 becomes primary , what will happen to already written
segments (very earlier - pending segments to be copied) ?

Ananth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c79d4e4c-00df-48a0-92fd-0d5b3e48f7fc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Moving Index/Shards from One Node to Another Elasticsearch	2	3464	July 6, 2017
3 node ES cluster...one node only holds replicas Elasticsearch	10	2109	July 5, 2017
Replica node Elasticsearch	3	480	July 6, 2017
Shards of an index present only in one node in a multinode cluster Elasticsearch	5	746	October 20, 2021
How does replication works in detail? Elasticsearch	4	3034	July 6, 2017

Need Clarification on Shards Replication

after some time adding a machine to cluster , at this point new indexing documents follows normal indexing flow (am i right ? Not sure !). But my question is how existing 1 million documents in primary shard is replicated to new machine ?

Related topics