I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.
So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?
That's ok, this is how Elasticsearch works. There is no need to randomize
or shuffle primaries. They have exactly the same work to do as replicas.
Replicas are promoted to primaries automatically on demand.
I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.
So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?
Thanks for the response, Jörg. My only concern is that what if the two
sides are located some non-trivial geographic distance from each other? If
all the primaries live on SideB and there is a large quantity of updates
coming into a node on SideA, doesn't it have to forward all that to the
node containing the primary on SideB? I worry about latency there.
Andrew
On Tuesday, August 12, 2014 10:49:55 AM UTC-4, Jörg Prante wrote:
That's ok, this is how Elasticsearch works. There is no need to randomize
or shuffle primaries. They have exactly the same work to do as replicas.
Replicas are promoted to primaries automatically on demand.
Jörg
On Tue, Aug 12, 2014 at 4:24 PM, Andrew Ruslander <andrew.r...@gmail.com
<javascript:>> wrote:
I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.
So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?
Latency is an issue, you are right. But that is not related to
primary/replica distribution.
If cluster state changes, e.g. new field names arrive, the master must be
reached very quickly for update, and the master pushes out the new state to
all nodes. It is crucial that state propagation must complete
instantaneously before indexing continues.
At the moment, there is no good solution for cross-continental networking,
since ES requires low latency networking. The best I can imagine is to set
up two clusters and sync them with an extra tool over a high latency line.
Thanks for the response, Jörg. My only concern is that what if the two
sides are located some non-trivial geographic distance from each other? If
all the primaries live on SideB and there is a large quantity of updates
coming into a node on SideA, doesn't it have to forward all that to the
node containing the primary on SideB? I worry about latency there.
Andrew
On Tuesday, August 12, 2014 10:49:55 AM UTC-4, Jörg Prante wrote:
That's ok, this is how Elasticsearch works. There is no need to randomize
or shuffle primaries. They have exactly the same work to do as replicas.
Replicas are promoted to primaries automatically on demand.
I have a five node Elasticsearch cluster set up so that two nodes are in
one zone and the other three nodes are in a different zone (let's call them
SideA and SideB) via use of the forced awareness attributes. I also have a
sixth node that has Logstash on it. Logstash is outputting to one of the
two nodes in SideA. However, when I use Marvel to view the shard
allocation across my five node cluster, I see that probably 95% of the
primary shards are all on SideB. SideA is almost exclusively replicas.
Just out of curiosity, I changed Logstash to output to a node in SideB,
but the primaries continued to be allocated to SideB only.
So my question is...is this expected behavior? What would cause the
primary shards to be allocated only to one side? Is it because it has
three nodes versus the two nodes on SideA? Is something else afoot here?
Andrew
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.