Elasticsearch replication protocol between datacenters?


(Jörg Prante) #1

Hi,

how about Bittorrent, could it be a feasible protocol for future
Elasticsearch replication between datacenters? Is the idea good or
bad? Pros and cons? Any comments welcome.

Just stumbled upon this post where Bittorrent protocol is used to
overcome solr index replication deficiencies (but within a datacenter
I presume)

http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/

Jörg


(David Pilato) #2

That's a very interesting article.
They decrease the replication time from 60 mn to 6 mn using bittorrent instead of http protocol !
That's really significant.

David

Le 28 janv. 2012 à 10:26, jprante joergprante@gmail.com a écrit :

Hi,

how about Bittorrent, could it be a feasible protocol for future
Elasticsearch replication between datacenters? Is the idea good or
bad? Pros and cons? Any comments welcome.

Just stumbled upon this post where Bittorrent protocol is used to
overcome solr index replication deficiencies (but within a datacenter
I presume)

http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/

Jörg


(Paul Loy) #3

We use BitTorrent for our deploys. It has reduced deployment time to 10%
for us also.

But I think this is where Solr vs ElasticSearch becomes interesting. Solr,
if I am correct, is Master-slave. That is one index is replicated but not
sharded. ES is sharded and replicated so individual replicas of individual
shards are much smaller than the entire index - well 1/shards times smaller.

Second, I think ES pushes changes to replicas as deltas already rather than
needing BitTorrent to hash and tell us what has changed.

So in the general usage pattern I don't think it'll help out that much. But
what about failure cases? Perhaps when you spin up a new node, that's when
a BitTorrent protocol could make you some savings in ES?

Paul.

On Sat, Jan 28, 2012 at 2:07 AM, David Pilato david@pilato.fr wrote:

That's a very interesting article.
They decrease the replication time from 60 mn to 6 mn using bittorrent
instead of http protocol !
That's really significant.

David

Le 28 janv. 2012 à 10:26, jprante joergprante@gmail.com a écrit :

Hi,

how about Bittorrent, could it be a feasible protocol for future
Elasticsearch replication between datacenters? Is the idea good or
bad? Pros and cons? Any comments welcome.

Just stumbled upon this post where Bittorrent protocol is used to
overcome solr index replication deficiencies (but within a datacenter
I presume)

http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/

Jörg

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #4

Replication in elasticsearch is different than current Solr replication mode, it does not replica internal index segments, where bit torrent can make some sense, it replicates operations.

On Sunday, January 29, 2012 at 11:32 AM, Paul Loy wrote:

We use BitTorrent for our deploys. It has reduced deployment time to 10% for us also.

But I think this is where Solr vs ElasticSearch becomes interesting. Solr, if I am correct, is Master-slave. That is one index is replicated but not sharded. ES is sharded and replicated so individual replicas of individual shards are much smaller than the entire index - well 1/shards times smaller.

Second, I think ES pushes changes to replicas as deltas already rather than needing BitTorrent to hash and tell us what has changed.

So in the general usage pattern I don't think it'll help out that much. But what about failure cases? Perhaps when you spin up a new node, that's when a BitTorrent protocol could make you some savings in ES?

Paul.

On Sat, Jan 28, 2012 at 2:07 AM, David Pilato <david@pilato.fr (mailto:david@pilato.fr)> wrote:

That's a very interesting article.
They decrease the replication time from 60 mn to 6 mn using bittorrent instead of http protocol !
That's really significant.

David

Le 28 janv. 2012 à 10:26, jprante <joergprante@gmail.com (mailto:joergprante@gmail.com)> a écrit :

Hi,

how about Bittorrent, could it be a feasible protocol for future
Elasticsearch replication between datacenters? Is the idea good or
bad? Pros and cons? Any comments welcome.

Just stumbled upon this post where Bittorrent protocol is used to
overcome solr index replication deficiencies (but within a datacenter
I presume)

http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication/

Jörg

--

Paul Loy
paul@keteracel.com (mailto:paul@keteracel.com)
http://uk.linkedin.com/in/paulloy


(system) #5