Question about elasticsearch shard zone


(tt) #1

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent
one node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--


(Radu Gheorghe) #2

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please rephrase it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt tt@thebackplane.com wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--


(tt) #3

Hi Radu,

The reason I want to set one node per rack/zone because adding a new layer
of rack/zone could be confusing when you manage it. If I set one node per
rack/zone, then I only need to worry about shard and replica per index.
Does it make sense?

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please rephrase
it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt <t...@thebackplane.com <javascript:>>
wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent
one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--


(Radu Gheorghe) #4

Hello Terrance,

On Wed, Oct 24, 2012 at 12:47 AM, tt tt@thebackplane.com wrote:

Hi Radu,

The reason I want to set one node per rack/zone because adding a new layer
of rack/zone could be confusing when you manage it. If I set one node per
rack/zone, then I only need to worry about shard and replica per index. Does
it make sense?

No, not really. Maybe it's just me :frowning:

What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please rephrase
it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt t...@thebackplane.com wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent
one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--

--


(tt) #5

Hi Radu,

I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:

What is the difference between (10 shards + 2 replicas with total 2 rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20 rack_id
and 1 node per rack_id)?

Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would
like to understand the difference between the two.

Thanks for your help,
Terence

On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:

Hello Terrance,

On Wed, Oct 24, 2012 at 12:47 AM, tt <t...@thebackplane.com <javascript:>>
wrote:

Hi Radu,

The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per index.
Does
it make sense?

No, not really. Maybe it's just me :frowning:

What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please rephrase
it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt t...@thebackplane.com wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes
per

rack_id, I wonder if we can use 1 node per rack_id? Basically I want
to

avoid confusion in the setup, basically each node.rack_id will
represent

one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--

--


(Radu Gheorghe) #6

Hello Terence,

Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.

Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.

If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Oct 24, 2012 at 9:16 PM, tt tt@thebackplane.com wrote:

Hi Radu,

I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:

What is the difference between (10 shards + 2 replicas with total 2 rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20 rack_id
and 1 node per rack_id)?

Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would like
to understand the difference between the two.

Thanks for your help,
Terence

On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:

Hello Terrance,

On Wed, Oct 24, 2012 at 12:47 AM, tt t...@thebackplane.com wrote:

Hi Radu,

The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per index.
Does
it make sense?

No, not really. Maybe it's just me :frowning:

What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please rephrase
it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt t...@thebackplane.com wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--

--

--


(tt) #7

Sorry, I actually mean 1 replica per shard. Thanks for the correction.

So my question is what is the benefit for using rack_ids since I can
already distribute the shard across all 20 nodes without using rack_id?

Thanks,
Terence

On Wednesday, October 24, 2012 11:34:48 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.

Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.

If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Oct 24, 2012 at 9:16 PM, tt <t...@thebackplane.com <javascript:>>
wrote:

Hi Radu,

I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:

What is the difference between (10 shards + 2 replicas with total 2
rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20
rack_id
and 1 node per rack_id)?

Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would
like
to understand the difference between the two.

Thanks for your help,
Terence

On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:

Hello Terrance,

On Wed, Oct 24, 2012 at 12:47 AM, tt t...@thebackplane.com wrote:

Hi Radu,

The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per
index.

Does
it make sense?

No, not really. Maybe it's just me :frowning:

What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica
won't

end up on the same node. While that should be possible, you
shouldn't

have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a
shard

is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) -
so

in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please
rephrase

it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt t...@thebackplane.com wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across
this

documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2
nodes

per
rack_id, I wonder if we can use 1 node per rack_id? Basically I
want

to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--

--

--


(Radu Gheorghe) #8

Hello Terence,

If you only do that, there's no benefit. It only make sense to use
such IDs when you want to separate groups of nodes. Individual nodes
are already separated, in Elasticsearch's view. That's why, for
example when you start ES with one index in the default configuration
(5 shards, 1 replica) - replicas are not allocated. Because it doesn't
make sense to have a shard and its replica on the same node.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Oct 24, 2012 at 9:41 PM, tt tt@thebackplane.com wrote:

Sorry, I actually mean 1 replica per shard. Thanks for the correction.

So my question is what is the benefit for using rack_ids since I can already
distribute the shard across all 20 nodes without using rack_id?

Thanks,
Terence

On Wednesday, October 24, 2012 11:34:48 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.

Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.

If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Oct 24, 2012 at 9:16 PM, tt t...@thebackplane.com wrote:

Hi Radu,

I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:

What is the difference between (10 shards + 2 replicas with total 2
rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20
rack_id
and 1 node per rack_id)?

Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would
like
to understand the difference between the two.

Thanks for your help,
Terence

On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:

Hello Terrance,

On Wed, Oct 24, 2012 at 12:47 AM, tt t...@thebackplane.com wrote:

Hi Radu,

The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per
index.
Does
it make sense?

No, not really. Maybe it's just me :frowning:

What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica
won't
end up on the same node. While that should be possible, you
shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a
shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) -
so
in case the whole group goes down you still have a working copy of
your data.

If I didn't understand your question correctly, can you please
rephrase
it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt t...@thebackplane.com wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came across
this
documentation:
http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2
nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I
want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--

--

--

--


(tt) #9

Hi Radu,

I got it working now. Thanks for your help!

Terence

On Wednesday, October 24, 2012 1:16:31 PM UTC-7, Radu Gheorghe wrote:

Hello Terence,

If you only do that, there's no benefit. It only make sense to use
such IDs when you want to separate groups of nodes. Individual nodes
are already separated, in Elasticsearch's view. That's why, for
example when you start ES with one index in the default configuration
(5 shards, 1 replica) - replicas are not allocated. Because it doesn't
make sense to have a shard and its replica on the same node.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Oct 24, 2012 at 9:41 PM, tt <t...@thebackplane.com <javascript:>>
wrote:

Sorry, I actually mean 1 replica per shard. Thanks for the correction.

So my question is what is the benefit for using rack_ids since I can
already
distribute the shard across all 20 nodes without using rack_id?

Thanks,
Terence

On Wednesday, October 24, 2012 11:34:48 AM UTC-7, Radu Gheorghe wrote:

Hello Terence,

Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.

Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.

If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Wed, Oct 24, 2012 at 9:16 PM, tt t...@thebackplane.com wrote:

Hi Radu,

I'm trying to set a distributed ES shard cluster that can provide
data

redundancy and improve read+write performance. Maybe I should ask the
following question:

What is the difference between (10 shards + 2 replicas with total 2
rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20
rack_id
and 1 node per rack_id)?

Either solution will consume 20 nodes, my original question was
leaning

toward the latter solution, but before I start implementing it, I
would

like
to understand the difference between the two.

Thanks for your help,
Terence

On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:

Hello Terrance,

On Wed, Oct 24, 2012 at 12:47 AM, tt t...@thebackplane.com wrote:

Hi Radu,

The reason I want to set one node per rack/zone because adding a
new

layer
of rack/zone could be confusing when you manage it. If I set one
node

per
rack/zone, then I only need to worry about shard and replica per
index.
Does
it make sense?

No, not really. Maybe it's just me :frowning:

What exactly are you trying to achieve here? To make sure that a
shard

and its replica don't end up on the same node, or... ?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

Thanks,
Terence

On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe
wrote:

Hello Terence,

Let me see if I understood your question correctly: you want to
set

one rack ID per node, to make sure that a shard and its replica
won't
end up on the same node. While that should be possible, you
shouldn't
have to do anything in order to achieve the desired result.

My understanding of the default behavior is the following: if a
shard
is allocated to a node, ES will look for other nodes to assign
the

next unassigned replica. If no nodes are available, the replica
will

remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc)

so
in case the whole group goes down you still have a working copy
of

your data.

If I didn't understand your question correctly, can you please
rephrase
it?

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Oct 23, 2012 at 4:05 AM, tt t...@thebackplane.com
wrote:

Hi there,

I'm trying to shard the elasticsearch cluster, and I came
across

this
documentation:

http://www.elasticsearch.org/guide/reference/modules/cluster.html

I understand most of the explanation, the example is saying 2
nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I
want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?

Thanks,
Terence

--

--

--

--


(system) #10