Uneven Primary Shard Distribution in Cluster


(Andrew Ruslander) #1

I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.

So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?

  • Andrew

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

That's ok, this is how Elasticsearch works. There is no need to randomize
or shuffle primaries. They have exactly the same work to do as replicas.
Replicas are promoted to primaries automatically on demand.

Jörg

On Tue, Aug 12, 2014 at 4:24 PM, Andrew Ruslander <
andrew.ruslander@gmail.com> wrote:

I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.

So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?

  • Andrew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoESRqSn5u1TEkWfoqNgnPmOgfy%2BbO5%2Bhv3gJ%2BDSZoBaCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew Ruslander) #3

Thanks for the response, Jörg. My only concern is that what if the two
sides are located some non-trivial geographic distance from each other? If
all the primaries live on SideB and there is a large quantity of updates
coming into a node on SideA, doesn't it have to forward all that to the
node containing the primary on SideB? I worry about latency there.

  • Andrew

On Tuesday, August 12, 2014 10:49:55 AM UTC-4, Jörg Prante wrote:

That's ok, this is how Elasticsearch works. There is no need to randomize
or shuffle primaries. They have exactly the same work to do as replicas.
Replicas are promoted to primaries automatically on demand.

Jörg

On Tue, Aug 12, 2014 at 4:24 PM, Andrew Ruslander <andrew.r...@gmail.com
<javascript:>> wrote:

I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.

So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?

  • Andrew

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/76d5de6f-4526-41cf-970e-0b54253bfea6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #4

Latency is an issue, you are right. But that is not related to
primary/replica distribution.

If cluster state changes, e.g. new field names arrive, the master must be
reached very quickly for update, and the master pushes out the new state to
all nodes. It is crucial that state propagation must complete
instantaneously before indexing continues.

At the moment, there is no good solution for cross-continental networking,
since ES requires low latency networking. The best I can imagine is to set
up two clusters and sync them with an extra tool over a high latency line.

Jörg

On Tue, Aug 12, 2014 at 5:15 PM, Andrew Ruslander <
andrew.ruslander@gmail.com> wrote:

Thanks for the response, Jörg. My only concern is that what if the two
sides are located some non-trivial geographic distance from each other? If
all the primaries live on SideB and there is a large quantity of updates
coming into a node on SideA, doesn't it have to forward all that to the
node containing the primary on SideB? I worry about latency there.

  • Andrew

On Tuesday, August 12, 2014 10:49:55 AM UTC-4, Jörg Prante wrote:

That's ok, this is how Elasticsearch works. There is no need to randomize
or shuffle primaries. They have exactly the same work to do as replicas.
Replicas are promoted to primaries automatically on demand.

Jörg

On Tue, Aug 12, 2014 at 4:24 PM, Andrew Ruslander andrew.r...@gmail.com
wrote:

I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.

So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?

  • Andrew

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/dba612d0-adb9-4bca-8c67-3b35e9236bc7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/76d5de6f-4526-41cf-970e-0b54253bfea6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/76d5de6f-4526-41cf-970e-0b54253bfea6%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFeNTzsdtHiT%2Bt0R%2BH2U056gKQEeYRH7-PHyiXau2qCYw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5