We have a relatively large monthly index (reaches approx. 1TB by the end of the month, so about 30GB are added daily) that is doing hundreds of updates per second - it has 10 shards with 1 replica, and is spread out evenly across 10 physical machines - 2 shards on each server.
I've noticed however that the load distribution is far from balanced - some machines have 2 primary shards for the index and they seem to be doing most of the update work with a full GC cycle every 7 minutes or so.
Machines with 1 primary shard and 1 replica experience a full GC approx. every 30 minutes and machines that only hold replica shards are the least utilized with approx. 45 minutes between full GC cycles.
Machine with 2 primary shards:
Machine with 1 primary and 1 replica:
Machine with 2 replica shards:
Is there any way to rebalance the primary shards so that there is no more than 1 primary shard on each machine?
It is not causing any issue at the moment, but it means that we can't really scale out; the more volume of data we push into this index, the more load the machines with the primary shards will need to handle, until at some point they will crash. In this case adding more shards or more machines is not going to help at all, since we can't guarantee even load distribution; even if we double the amount of shards and machines, we may remain in the same situation with some machines handling most of the load while others being mostly idle.
To answer @warkolm's question, we are not using allocation awareness (I'm not sure how it would help), and I'm not sure what you mean by "node or a host"? These are screen captures from Kibana showing the JVM heap utilization of the different data nodes in the cluster.
Well that's my point really
The GC graphs look fine now, but if we increase the load they will start being much more frequent on the nodes with two primary shards, because the load is not properly distributed.
I do wonder if this kind of load distribution is specific to the update use case (where primary shard does more work than replica)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.