Multimaster?


(project2501) #1

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.


(Shay Banon) #2

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its 5
shards and 1 replica for each shard. So effectively each index is "multi
mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5150@gmail.com wrote:

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.


(David Pilato) #3

I think that he is talking about primary shards (where Lucene does the indexing
job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch Cluster.
So Indexing with Lucene is done in only one place for a given shard instead.

Shay, if I understood your berlin's talk, that's the way ES works for indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kimchy@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its 5
shards and 1 replica for each shard. So effectively each index is "multi
mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5150@gmail.com wrote:

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.
--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Shay Banon) #4

Thats not entirely true on how elasticsearch works. Basically, an index is
broken down into "replica sets" or partitions. By default, that number if
5. Each partition has at least one shard, and possible extra replicas. A
partition has an elected "primary" shard that all indexed go through it,
but then also applied to any replica shards.

Here is a simple video showing how it works:
http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram.html.

And the berlin buzzwords talk that goes into more depth:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html
.

On Fri, Dec 30, 2011 at 5:25 PM, david@pilato.fr david@pilato.fr wrote:

**

I think that he is talking about primary shards (where Lucene does the
indexing job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch
Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works for
indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kimchy@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its 5
shards and 1 replica for each shard. So effectively each index is "multi
mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5150@gmail.com
wrote:

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(project2501) #5

That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an index is
broken down into "replica sets" or partitions. By default, that number if
5. Each partition has at least one shard, and possible extra replicas. A
partition has an elected "primary" shard that all indexed go through it,
but then also applied to any replica shards.

Here is a simple video showing how it works:http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....

And the berlin buzzwords talk that goes into more depth:http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr wrote:

**

I think that he is talking about primary shards (where Lucene does the
indexing job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch
Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works for
indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its 5
shards and 1 replica for each shard. So effectively each index is "multi
mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5...@gmail.com
wrote:

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Shay Banon) #6

To be honest, I don't really understand what you are trying to say... . In
elasticsearch writing to a specific node is not really relevant, since it
will be redirected to where the shard that needs to be indexed exists on.
See the vids, I don't think you "get" that part...

On Fri, Dec 30, 2011 at 5:45 PM, project2501 darreng5150@gmail.com wrote:

That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an index
is
broken down into "replica sets" or partitions. By default, that number if
5. Each partition has at least one shard, and possible extra replicas. A
partition has an elected "primary" shard that all indexed go through it,
but then also applied to any replica shards.

Here is a simple video showing how it works:
http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....

And the berlin buzzwords talk that goes into more depth:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr
wrote:

**

I think that he is talking about primary shards (where Lucene does the
indexing job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch
Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works for
indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its
5

shards and 1 replica for each shard. So effectively each index is
"multi

mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5...@gmail.com
wrote:

Hi,
Any plans for multimaster capability allowing high volume writes
to

local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Stanislas Polu) #7

Looks like being able to force the location of each shard and specify the sharding strategy would meet his requirement. Right?

-stan

On Dec 30, 2011, at 5:18 PM, Shay Banon kimchy@gmail.com wrote:

To be honest, I don't really understand what you are trying to say... . In elasticsearch writing to a specific node is not really relevant, since it will be redirected to where the shard that needs to be indexed exists on. See the vids, I don't think you "get" that part...

On Fri, Dec 30, 2011 at 5:45 PM, project2501 darreng5150@gmail.com wrote:
That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an index is
broken down into "replica sets" or partitions. By default, that number if
5. Each partition has at least one shard, and possible extra replicas. A
partition has an elected "primary" shard that all indexed go through it,
but then also applied to any replica shards.

Here is a simple video showing how it works:http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....

And the berlin buzzwords talk that goes into more depth:http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr wrote:

**

I think that he is talking about primary shards (where Lucene does the
indexing job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch
Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works for
indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its 5
shards and 1 replica for each shard. So effectively each index is "multi
mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5...@gmail.com
wrote:

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(project2501) #8

What I'm saying is that 5000+ simultaneous connections to a single
node's port prior to redirection won't scale for indexing. Will it?

I understand the redirection. But there are practical limits to the
number of open sockets a "single" port can handle at one time.
So a single write "redirector" can run into scaling issues when you
have 1000's of connections at once. I believe ES relies on a single
port on a single node for index writes (which it thus redirects). Is
that true?

Multimaster solves this problem.

In my use case there will be more throttling at index time, than query
time.
My understanding also is that queries can occur anywhere in the
cluster and be consistent. So the problem doesn't lie there.

I'm still learning ES though. I read the materials. :slight_smile:

On Dec 30, 11:18 am, Shay Banon kim...@gmail.com wrote:

To be honest, I don't really understand what you are trying to say... . In
elasticsearch writing to a specific node is not really relevant, since it
will be redirected to where the shard that needs to be indexed exists on.
See the vids, I don't think you "get" that part...

On Fri, Dec 30, 2011 at 5:45 PM, project2501 darreng5...@gmail.com wrote:

That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an index
is
broken down into "replica sets" or partitions. By default, that number if
5. Each partition has at least one shard, and possible extra replicas. A
partition has an elected "primary" shard that all indexed go through it,
but then also applied to any replica shards.

Here is a simple video showing how it works:
http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....

And the berlin buzzwords talk that goes into more depth:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr
wrote:

**

I think that he is talking about primary shards (where Lucene does the
indexing job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch
Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works for
indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its
5

shards and 1 replica for each shard. So effectively each index is
"multi

mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5...@gmail.com
wrote:

Hi,
Any plans for multimaster capability allowing high volume writes
to

local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(project2501) #9

Yes. In other words, let me handle the shard writing decisions at
index time in such a way that I can control the topology. This
could allow all the servers to work in a "shared nothing" state as
much as possible. The shard topology can still be globally known but
bypassing a single port/server for redirection will greatly improve
write scalability.

On Dec 30, 11:32 am, Stanislas Polu polu.stanis...@gmail.com wrote:

Looks like being able to force the location of each shard and specify the sharding strategy would meet his requirement. Right?

-stan

On Dec 30, 2011, at 5:18 PM, Shay Banon kim...@gmail.com wrote:

To be honest, I don't really understand what you are trying to say... . In elasticsearch writing to a specific node is not really relevant, since it will be redirected to where the shard that needs to be indexed exists on. See the vids, I don't think you "get" that part...

On Fri, Dec 30, 2011 at 5:45 PM, project2501 darreng5...@gmail.com wrote:
That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an index is
broken down into "replica sets" or partitions. By default, that number if
5. Each partition has at least one shard, and possible extra replicas. A
partition has an elected "primary" shard that all indexed go through it,
but then also applied to any replica shards.

Here is a simple video showing how it works:http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....

And the berlin buzzwords talk that goes into more depth:http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...
.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr wrote:

**

I think that he is talking about primary shards (where Lucene does the
indexing job) vs. secondary shards (where you can perform searches).

There could be only one shard elected as primary in an ElasticSearch
Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works for
indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a écrit :

Not sure I understand the question... . When you create an index, you
specify the number of shards, and number of replicas. By default its 5
shards and 1 replica for each shard. So effectively each index is "multi
mastered".

Note, I suspect that you are asking about multi master of an "index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 darreng5...@gmail.com
wrote:

Hi,
Any plans for multimaster capability allowing high volume writes to
local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Berkay Mollamustafaoglu-2) #10

Just to note, you could do this currently by using index aliases. You can
create multiple indices and create an alias (that combines multiple
indices). Different clients can write to different indices in parallel and
you can search across all the indices using the alias.
Sharding gives you the option to distribute data automatically but the
control is also available via aliases and ES capability to search across
indices.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Fri, Dec 30, 2011 at 11:39 AM, project2501 darreng5150@gmail.com wrote:

Yes. In other words, let me handle the shard writing decisions at
index time in such a way that I can control the topology. This
could allow all the servers to work in a "shared nothing" state as
much as possible. The shard topology can still be globally known but
bypassing a single port/server for redirection will greatly improve
write scalability.

On Dec 30, 11:32 am, Stanislas Polu polu.stanis...@gmail.com wrote:

Looks like being able to force the location of each shard and specify
the sharding strategy would meet his requirement. Right?

-stan

On Dec 30, 2011, at 5:18 PM, Shay Banon kim...@gmail.com wrote:

To be honest, I don't really understand what you are trying to say...
. In elasticsearch writing to a specific node is not really relevant, since
it will be redirected to where the shard that needs to be indexed exists
on. See the vids, I don't think you "get" that part...

On Fri, Dec 30, 2011 at 5:45 PM, project2501 darreng5...@gmail.com
wrote:

That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an
index is

broken down into "replica sets" or partitions. By default, that
number if

  1. Each partition has at least one shard, and possible extra
    replicas. A

partition has an elected "primary" shard that all indexed go through
it,

but then also applied to any replica shards.

Here is a simple video showing how it works:
http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram....

And the berlin buzzwords talk that goes into more depth:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-...

.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr
wrote:

**

I think that he is talking about primary shards (where Lucene does
the

indexing job) vs. secondary shards (where you can perform
searches).

There could be only one shard elected as primary in an
ElasticSearch

Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works
for

indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a
écrit :

Not sure I understand the question... . When you create an
index, you

specify the number of shards, and number of replicas. By default
its 5

shards and 1 replica for each shard. So effectively each index
is "multi

mastered".

Note, I suspect that you are asking about multi master of an
"index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 <
darreng5...@gmail.com>

wrote:

Hi,
Any plans for multimaster capability allowing high volume
writes to

local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Shay Banon) #11

First, 5000 open connections is not that bad, but why 5000? If you have 100
clients, then you will have 100 connections (you can multiplex on them, if
you use the Java client, thats what it does). Also, you can round robin
across the server cluster nodes.

On Fri, Dec 30, 2011 at 6:36 PM, project2501 darreng5150@gmail.com wrote:

What I'm saying is that 5000+ simultaneous connections to a single
node's port prior to redirection won't scale for indexing. Will it?

I understand the redirection. But there are practical limits to the
number of open sockets a "single" port can handle at one time.
So a single write "redirector" can run into scaling issues when you
have 1000's of connections at once. I believe ES relies on a single
port on a single node for index writes (which it thus redirects). Is
that true?

Multimaster solves this problem.

http://en.wikipedia.org/wiki/Multi-master_replication

In my use case there will be more throttling at index time, than query
time.
My understanding also is that queries can occur anywhere in the
cluster and be consistent. So the problem doesn't lie there.

I'm still learning ES though. I read the materials. :slight_smile:

On Dec 30, 11:18 am, Shay Banon kim...@gmail.com wrote:

To be honest, I don't really understand what you are trying to say... .
In
elasticsearch writing to a specific node is not really relevant, since it
will be redirected to where the shard that needs to be indexed exists on.
See the vids, I don't think you "get" that part...

On Fri, Dec 30, 2011 at 5:45 PM, project2501 darreng5...@gmail.com
wrote:

That part I get. But when write documents to the index, I write them
to a single known node?
Then the indexed data is distributed? Is this true?

So would this be a single master in terms of writing/indexing?

In my use case, I have 100 servers each with 50 threads all needing to
perform writes.
That's 5000 index writes per second. When I say multimaster, I mean
the ability to perform
those index writes to local nodes thereby only 50 threads per node
performing writes.
Yet, gain the ability to do distributed searches as ES normally would.

Does this make more sense?

On Dec 30, 10:31 am, Shay Banon kim...@gmail.com wrote:

Thats not entirely true on how elasticsearch works. Basically, an
index

is

broken down into "replica sets" or partitions. By default, that
number if

  1. Each partition has at least one shard, and possible extra
    replicas. A

partition has an elected "primary" shard that all indexed go through
it,

but then also applied to any replica shards.

Here is a simple video showing how it works:
http://www.elasticsearch.org/videos/2010/02/08/es-distributed-diagram..
..

And the berlin buzzwords talk that goes into more depth:
http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-.
..

.

On Fri, Dec 30, 2011 at 5:25 PM, da...@pilato.fr da...@pilato.fr
wrote:

**

I think that he is talking about primary shards (where Lucene does
the

indexing job) vs. secondary shards (where you can perform
searches).

There could be only one shard elected as primary in an
ElasticSearch

Cluster.

So Indexing with Lucene is done in only one place for a given shard
instead.

Shay, if I understood your berlin's talk, that's the way ES works
for

indexing.

Is that what you were talking about ?

David.

Le 30 décembre 2011 à 16:07, Shay Banon kim...@gmail.com a
écrit :

Not sure I understand the question... . When you create an
index, you

specify the number of shards, and number of replicas. By default
its

5

shards and 1 replica for each shard. So effectively each index is
"multi

mastered".

Note, I suspect that you are asking about multi master of an
"index".

On Fri, Dec 30, 2011 at 4:24 PM, project2501 <
darreng5...@gmail.com>

wrote:

Hi,
Any plans for multimaster capability allowing high volume
writes

to

local nodes yet query across them as one?

thanks.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet


(Karel Minarik) #12

On Dec 30, 5:36 pm, project2501 darreng5...@gmail.com wrote:

I understand the redirection. But there are practical limits to the
number of open sockets a "single" port can handle at one time.
So a single write "redirector" can run into scaling issues when you
have 1000's of connections at once. I believe ES relies on a single
port on a single node for index writes (which it thus redirects).

I don't understand the exact issue you seem to be worried about, but
do note that you can write to any node in the cluster (which has
http enabled). The changes will be propagated to all the replicas,
quite similar to the “multi-master” scenario (eg. in something like
CouchDB). Maybe check out the relevant configs:
https://github.com/elasticsearch/elasticsearch/blob/master/config/elasticsearch.yml#L35

Karel


(system) #13