I have one es master and data-node and indexing documents to that (1 shard
1 Replica), after indexing few documents (say 1 million and still
indexing docs), adding one more data node to the cluster , now the shards
started replicating to new node. How this replication happens ? In the
mean i am still indexing new documents to that index.
Whether datanode1 will send index segments to datanode2 ?
Whether datanode1 will send documents one by one (as IndexRequests)
to datanode2 instead of copying segments ?
Whether datanode1 will send whole index to datanode2 ?
How will indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb these settings react with
respect to the above test scenario ?
replication is done per document (as opposed to relocation). So the
document is indexed on the primary first, and if it was successful there,
the document is indexed on all replicas of a shard in parallel. If that
index operation on the replica(s) has returned, the index requests is
returned to the client.
The throttling of merges (which is a heavy I/O and CPU intensive background
process) ensures, you have enough I/O performance available for index and
search operations.
I have one es master and data-node and indexing documents to that (1 shard
1 Replica), after indexing few documents (say 1 million and still
indexing docs), adding one more data node to the cluster , now the shards
started replicating to new node. How this replication happens ? In the
mean i am still indexing new documents to that index.
Whether datanode1 will send index segments to datanode2 ?
Whether datanode1 will send documents one by one (as IndexRequests)
to datanode2 instead of copying segments ?
Whether datanode1 will send whole index to datanode2 ?
How will indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb these settings react
with respect to the above test scenario ?
Thanks for replying. If i understand correctly normal indexing flow is,
Document is indexed in primary shard machine , then replica shard
machine then return the index response to client - in case of
ReplicationType.SYNC.
Document is indexed in primary shard machine , then sent it to replica
machine(s) if available, and wont wait for response - in case of
ReplicationType.ASYNC.
But my question is not normal indexing flow. I have already indexed 1
million documents in primary shard alone , at that moment no node is
available for replica.
after some time adding a machine to cluster , at this point new indexing
documents follows normal indexing flow (am i right ? Not sure !). But my
question is how existing 1 million documents in primary shard is replicated
to new machine ?
It's relocation. Segments are copied over the wire. New updates/insert/delete operations which happen in the meantime are replayed from the transaction log on the new shard.
HTH
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Thanks for replying. If i understand correctly normal indexing flow is,
Document is indexed in primary shard machine , then replica shard machine then return the index response to client - in case of ReplicationType.SYNC.
Document is indexed in primary shard machine , then sent it to replica machine(s) if available, and wont wait for response - in case of ReplicationType.ASYNC.
But my question is not normal indexing flow. I have already indexed 1 million documents in primary shard alone , at that moment no node is available for replica.
after some time adding a machine to cluster , at this point new indexing documents follows normal indexing flow (am i right ? Not sure !). But my question is how existing 1 million documents in primary shard is replicated to new machine ?
Until existing(not newly created) segments are fully copied to the new
machine , no indexing operation will happen on replica shard right ? rather
it notes down those new indexing documents in transaction log alone ?((Correct
me if i am wrong))
Once all segments are copied it replays the transaction logs . if so no new
documents visible for search , till segments copying process over. is it
right ?
*indices.store.throttle.type: merge
& indices.store.throttle.max_bytes_per_sec: 50mb *these properties related
to only lucene segment merges alone am i right ?
Until existing(not newly created) segments are fully copied to the new machine , no indexing operation will happen on replica shard right ? rather it notes down those new indexing documents in transaction log alone ?((Correct me if i am wrong))
Once all segments are copied it replays the transaction logs . if so no new documents visible for search , till segments copying process over. is it right ?
Correct. Replica shard won't be in STARTED state so it won't be searchable.
indices.store.throttle.type: merge & indices.store.throttle.max_bytes_per_sec: 50mb these properties related to only lucene segment merges alone am i right ?
See whole definition: https://github.com/elasticsearch/elasticsearch/issues/2041
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.