Need Clarification on Shards Replication

I have one es master and data-node and indexing documents to that (1 shard

  • 1 Replica), after indexing few documents (say 1 million and still
    indexing docs), adding one more data node to the cluster , now the shards
    started replicating to new node. How this replication happens ? In the
    mean i am still indexing new documents to that index.

    1. Whether datanode1 will send index segments to datanode2 ?
    2. Whether datanode1 will send documents one by one (as IndexRequests)
      to datanode2 instead of copying segments ?
    3. Whether datanode1 will send whole index to datanode2 ?

How will indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb
these settings react with
respect to the above test scenario ?

Anantha Govindarajan.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/326cfecc-b59c-4e4c-b5e9-e369e841a02e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

replication is done per document (as opposed to relocation). So the
document is indexed on the primary first, and if it was successful there,
the document is indexed on all replicas of a shard in parallel. If that
index operation on the replica(s) has returned, the index requests is
returned to the client.

The throttling of merges (which is a heavy I/O and CPU intensive background
process) ensures, you have enough I/O performance available for index and
search operations.

Hope this helps...

--Alex

On Thu, Jan 2, 2014 at 6:43 AM, Anantha Govindarajan <
ananthagovindarajan@gmail.com> wrote:

I have one es master and data-node and indexing documents to that (1 shard

  • 1 Replica), after indexing few documents (say 1 million and still
    indexing docs), adding one more data node to the cluster , now the shards
    started replicating to new node. How this replication happens ? In the
    mean i am still indexing new documents to that index.

    1. Whether datanode1 will send index segments to datanode2 ?
    2. Whether datanode1 will send documents one by one (as IndexRequests)
      to datanode2 instead of copying segments ?
    3. Whether datanode1 will send whole index to datanode2 ?

How will indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb
these settings react
with respect to the above test scenario ?

Anantha Govindarajan.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/326cfecc-b59c-4e4c-b5e9-e369e841a02e%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_EZ_bb1hsVpLyW7Pt0UWM47GRU2iuQt_mJPV0xoO5iSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Alex

Thanks for replying. If i understand correctly normal indexing flow is,

  • Document is indexed in primary shard machine , then replica shard
    machine then return the index response to client - in case of
    ReplicationType.SYNC.
  • Document is indexed in primary shard machine , then sent it to replica
    machine(s) if available, and wont wait for response - in case of
    ReplicationType.ASYNC.

But my question is not normal indexing flow. I have already indexed 1
million documents in primary shard alone , at that moment no node is
available for replica.

after some time adding a machine to cluster , at this point new indexing
documents follows normal indexing flow (am i right ? Not sure !). But my
question is how existing 1 million documents in primary shard is replicated
to new machine ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd473bdb-8603-44b2-a59c-0a8f3033ad0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It's relocation. Segments are copied over the wire. New updates/insert/delete operations which happen in the meantime are replayed from the transaction log on the new shard.

HTH

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 2 janv. 2014 à 10:39, Anantha Govindarajan ananthagovindarajan@gmail.com a écrit :

Hi Alex

Thanks for replying. If i understand correctly normal indexing flow is,

Document is indexed in primary shard machine , then replica shard machine then return the index response to client - in case of ReplicationType.SYNC.
Document is indexed in primary shard machine , then sent it to replica machine(s) if available, and wont wait for response - in case of ReplicationType.ASYNC.
But my question is not normal indexing flow. I have already indexed 1 million documents in primary shard alone , at that moment no node is available for replica.

after some time adding a machine to cluster , at this point new indexing documents follows normal indexing flow (am i right ? Not sure !). But my question is how existing 1 million documents in primary shard is replicated to new machine ?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd473bdb-8603-44b2-a59c-0a8f3033ad0d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2DA64523-60B8-4A18-86C7-4A737FADD6B1%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David ,

Thanks for your reply .

Until existing(not newly created) segments are fully copied to the new
machine , no indexing operation will happen on replica shard right ? rather
it notes down those new indexing documents in transaction log alone ?((Correct
me if i am wrong))

Once all segments are copied it replays the transaction logs . if so no new
documents visible for search , till segments copying process over. is it
right ?

*indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb *these properties related
to only lucene segment merges alone am i right ?

Ananth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/051a21a8-ba9f-4401-84ce-fce31a28b5fc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Until existing(not newly created) segments are fully copied to the new machine , no indexing operation will happen on replica shard right ? rather it notes down those new indexing documents in transaction log alone ?((Correct me if i am wrong))
Once all segments are copied it replays the transaction logs . if so no new documents visible for search , till segments copying process over. is it right ?

Correct. Replica shard won't be in STARTED state so it won't be searchable.

indices.store.throttle.type: merge & indices.store.throttle.max_bytes_per_sec: 50mb these properties related to only lucene segment merges alone am i right ?
See whole definition: https://github.com/elasticsearch/elasticsearch/issues/2041

HTH

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52c69681.643c9869.11bb1%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David

Thanks for your answers. I have few questions too ,

What will happen if datanode1 restarted while sending segments to
datanode2

  1. On datanode1 start-up, which will be become primary shard ? Shard in
    datanode1 or shard in datanode2 ?

  2. If datanode1 becomes primary then how would it knows , how many amount
    of segments are transferred to datanode2 before restart happen?

  3. if datanode2 becomes primary , what will happen to already written
    segments (very earlier - pending segments to be copied) ?

Ananth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c79d4e4c-00df-48a0-92fd-0d5b3e48f7fc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.