Network latency while adding new replica


(Tarun Jangra) #1

I am confused with this. Elastic search is following Push replication algorithm instead of having "Pull replication algorithm". Where chunk of data are suppose to be pushed to replicas. if i have say 5 nodes, And i get a situation where i need to add more nodes to increase resources. Than obviously Shards are suppose to be replicated on all new nodes. But if shards have documents in millions than what happened with the data while coping shard to new replicas. Because system is available during this copying process. How it will manage to have updates in shard happened during this copying process? And time of availability of new replicas is actually depend over size of shard being copied? So could it vary accordingly?


(Shay Banon) #2

Yes, the time it will take to move shards around depends on the size of
them. It allows for indexing operation to still occur because it uses a
transaction log, which is used to make sure that the copy process can take
place while indexing is still happening (through replaying it against hte
copy when needed).

On Wed, Oct 5, 2011 at 1:37 PM, tarun.jangra tarun@izap.in wrote:

I am confused with this. Elastic search is following Push replication
algorithm instead of having "Pull replication algorithm". Where chunk of
data are suppose to be pushed to replicas. if i have say 5 nodes, And i get
a situation where i need to add more nodes to increase resources. Than
obviously Shards are suppose to be replicated on all new nodes. But if
shards have documents in millions than what happened with the data while
coping shard to new replicas. Because system is available during this
copying process. How it will manage to have updates in shard happened
during
this copying process? And time of availability of new replicas is actually
depend over size of shard being copied? So could it vary accordingly?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Network-latency-while-adding-new-replica-tp3396237p3396237.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Tarun Jangra) #3

it means there is no chance of data loss unless whole cluster is down.


(Shay Banon) #4

Thats a different question then moving shards around, and you won't loose
data even if the whole cluster is down as long as you bring it back up using
the same data location.

On Thu, Oct 6, 2011 at 10:23 PM, Tarun tarun@izap.in wrote:

it means there is no chance of data loss unless whole cluster is down.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Network-latency-while-adding-new-replica-tp3396237p3400951.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Tarun Jangra) #5

hi kimchy,
thanks for your prompt reply. suppose i need highly available system for 10 million documents. than what is the best combinations of shards and replicas over amazon cloud and why?


(Shay Banon) #6

The number of shards really depends on the document structure, size of it,
number of fields... . You will need to test a bit. As for number of
replicas, they control how highly available you want it to be. With
number_of_replicas set to 1, you will have two copies of your data.

On Thu, Oct 6, 2011 at 10:45 PM, Tarun tarun@izap.in wrote:

hi kimchy,
thanks for your prompt reply. suppose i need highly available system for 10
million documents. than what is the best combinations of shards and
replicas
over amazon cloud and why?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Network-latency-while-adding-new-replica-tp3396237p3401023.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #7