I'm trying to shard the elasticsearch cluster, and I came across this
documentation: http://www.elasticsearch.org/guide/reference/modules/cluster.html
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent
one node instead of one rack(or a set of server).
What do you think?
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please rephrase it?
I'm trying to shard the elasticsearch cluster, and I came across this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent one
node instead of one rack(or a set of server).
What do you think?
The reason I want to set one node per rack/zone because adding a new layer
of rack/zone could be confusing when you manage it. If I set one node per
rack/zone, then I only need to worry about shard and replica per index.
Does it make sense?
Thanks,
Terence
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please rephrase
it?
I'm trying to shard the elasticsearch cluster, and I came across this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent
one
node instead of one rack(or a set of server).
What do you think?
The reason I want to set one node per rack/zone because adding a new layer
of rack/zone could be confusing when you manage it. If I set one node per
rack/zone, then I only need to worry about shard and replica per index. Does
it make sense?
No, not really. Maybe it's just me
What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please rephrase
it?
I'm trying to shard the elasticsearch cluster, and I came across this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2 nodes per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want to
avoid confusion in the setup, basically each node.rack_id will represent
one
node instead of one rack(or a set of server).
What do you think?
I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:
What is the difference between (10 shards + 2 replicas with total 2 rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20 rack_id
and 1 node per rack_id)?
Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would
like to understand the difference between the two.
Thanks for your help,
Terence
On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:
The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per index.
Does
it make sense?
No, not really. Maybe it's just me
What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please rephrase
it?
I'm trying to shard the elasticsearch cluster, and I came across this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2 nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?
Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.
Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.
If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.
I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:
What is the difference between (10 shards + 2 replicas with total 2 rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20 rack_id
and 1 node per rack_id)?
Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would like
to understand the difference between the two.
Thanks for your help,
Terence
On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:
The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per index.
Does
it make sense?
No, not really. Maybe it's just me
What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica won't
end up on the same node. While that should be possible, you shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) - so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please rephrase
it?
I'm trying to shard the elasticsearch cluster, and I came across this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2 nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?
Sorry, I actually mean 1 replica per shard. Thanks for the correction.
So my question is what is the benefit for using rack_ids since I can
already distribute the shard across all 20 nodes without using rack_id?
Thanks,
Terence
On Wednesday, October 24, 2012 11:34:48 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.
Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.
If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.
I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:
What is the difference between (10 shards + 2 replicas with total 2
rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20
rack_id
and 1 node per rack_id)?
Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would
like
to understand the difference between the two.
Thanks for your help,
Terence
On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:
The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per
index.
Does
it make sense?
No, not really. Maybe it's just me
What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica
won't
end up on the same node. While that should be possible, you
shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a
shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) -
so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please
rephrase
it?
I'm trying to shard the elasticsearch cluster, and I came across
this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2
nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I
want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?
If you only do that, there's no benefit. It only make sense to use
such IDs when you want to separate groups of nodes. Individual nodes
are already separated, in Elasticsearch's view. That's why, for
example when you start ES with one index in the default configuration
(5 shards, 1 replica) - replicas are not allocated. Because it doesn't
make sense to have a shard and its replica on the same node.
Sorry, I actually mean 1 replica per shard. Thanks for the correction.
So my question is what is the benefit for using rack_ids since I can already
distribute the shard across all 20 nodes without using rack_id?
Thanks,
Terence
On Wednesday, October 24, 2012 11:34:48 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.
Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.
If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.
I'm trying to set a distributed ES shard cluster that can provide data
redundancy and improve read+write performance. Maybe I should ask the
following question:
What is the difference between (10 shards + 2 replicas with total 2
rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20
rack_id
and 1 node per rack_id)?
Either solution will consume 20 nodes, my original question was leaning
toward the latter solution, but before I start implementing it, I would
like
to understand the difference between the two.
Thanks for your help,
Terence
On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:
The reason I want to set one node per rack/zone because adding a new
layer
of rack/zone could be confusing when you manage it. If I set one node
per
rack/zone, then I only need to worry about shard and replica per
index.
Does
it make sense?
No, not really. Maybe it's just me
What exactly are you trying to achieve here? To make sure that a shard
and its replica don't end up on the same node, or... ?
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to set
one rack ID per node, to make sure that a shard and its replica
won't
end up on the same node. While that should be possible, you
shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a
shard
is allocated to a node, ES will look for other nodes to assign the
next unassigned replica. If no nodes are available, the replica will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc) -
so
in case the whole group goes down you still have a working copy of
your data.
If I didn't understand your question correctly, can you please
rephrase
it?
I'm trying to shard the elasticsearch cluster, and I came across
this
documentation: Elasticsearch Platform — Find real-time answers at scale | Elastic
I understand most of the explanation, the example is saying 2
nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I
want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?
On Wednesday, October 24, 2012 1:16:31 PM UTC-7, Radu Gheorghe wrote:
Hello Terence,
If you only do that, there's no benefit. It only make sense to use
such IDs when you want to separate groups of nodes. Individual nodes
are already separated, in Elasticsearch's view. That's why, for
example when you start ES with one index in the default configuration
(5 shards, 1 replica) - replicas are not allocated. Because it doesn't
make sense to have a shard and its replica on the same node.
Sorry, I actually mean 1 replica per shard. Thanks for the correction.
So my question is what is the benefit for using rack_ids since I can
already
distribute the shard across all 20 nodes without using rack_id?
Thanks,
Terence
On Wednesday, October 24, 2012 11:34:48 AM UTC-7, Radu Gheorghe wrote:
Hello Terence,
Please note that 10 shards + 2 replicas per shard will get you a total
of 30 shards. To get 20, you need 1 replica per shard.
Back to your question, assuming that you got 20 nodes and 20 total
shards, the configuration with 20 rack_ids would be the same as one
with no rack_ids at all: one shard per node, and each shard might end
up on each node.
If you have 2 rack_ids, your primary shards will be on the nodes on
one rack, and your replicas will be on the other rack.
I'm trying to set a distributed ES shard cluster that can provide
data
redundancy and improve read+write performance. Maybe I should ask the
following question:
What is the difference between (10 shards + 2 replicas with total 2
rack_id
and 10 nodes per rack_id) VS (10 shards + 2 replicas with total 20
rack_id
and 1 node per rack_id)?
Either solution will consume 20 nodes, my original question was
leaning
toward the latter solution, but before I start implementing it, I
would
like
to understand the difference between the two.
Thanks for your help,
Terence
On Tuesday, October 23, 2012 11:23:49 PM UTC-7, Radu Gheorghe wrote:
The reason I want to set one node per rack/zone because adding a
new
layer
of rack/zone could be confusing when you manage it. If I set one
node
per
rack/zone, then I only need to worry about shard and replica per
index.
Does
it make sense?
No, not really. Maybe it's just me
What exactly are you trying to achieve here? To make sure that a
shard
and its replica don't end up on the same node, or... ?
On Tuesday, October 23, 2012 12:21:09 AM UTC-7, Radu Gheorghe
wrote:
Hello Terence,
Let me see if I understood your question correctly: you want to
set
one rack ID per node, to make sure that a shard and its replica
won't
end up on the same node. While that should be possible, you
shouldn't
have to do anything in order to achieve the desired result.
My understanding of the default behavior is the following: if a
shard
is allocated to a node, ES will look for other nodes to assign
the
next unassigned replica. If no nodes are available, the replica
will
remain unassigned (cluster state yellow). Attributes like rack_id
would come into play when you want to make sure a shard and its
replica don't rely on the same group of servers (rack, zone, etc)
so
in case the whole group goes down you still have a working copy
of
your data.
If I didn't understand your question correctly, can you please
rephrase
it?
I understand most of the explanation, the example is saying 2
nodes
per
rack_id, I wonder if we can use 1 node per rack_id? Basically I
want
to
avoid confusion in the setup, basically each node.rack_id will
represent
one
node instead of one rack(or a set of server).
What do you think?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.