Skewed primary shards distribution leads to performance issues

We have a relatively large monthly index (reaches approx. 1TB by the end of the month, so about 30GB are added daily) that is doing hundreds of updates per second - it has 10 shards with 1 replica, and is spread out evenly across 10 physical machines - 2 shards on each server.

I've noticed however that the load distribution is far from balanced - some machines have 2 primary shards for the index and they seem to be doing most of the update work with a full GC cycle every 7 minutes or so.
Machines with 1 primary shard and 1 replica experience a full GC approx. every 30 minutes and machines that only hold replica shards are the least utilized with approx. 45 minutes between full GC cycles.

Machine with 2 primary shards:

Machine with 1 primary and 1 replica:

Machine with 2 replica shards:

Is there any way to rebalance the primary shards so that there is no more than 1 primary shard on each machine?

Any suggestions will be appreciated :expressionless:

Is this a node or a host?
Are you using allocation awareness?

What type of issues is this causing?

Apologies for the late reply, I was on vacation.

It is not causing any issue at the moment, but it means that we can't really scale out; the more volume of data we push into this index, the more load the machines with the primary shards will need to handle, until at some point they will crash. In this case adding more shards or more machines is not going to help at all, since we can't guarantee even load distribution; even if we double the amount of shards and machines, we may remain in the same situation with some machines handling most of the load while others being mostly idle.

To answer @warkolm's question, we are not using allocation awareness (I'm not sure how it would help), and I'm not sure what you mean by "node or a host"? These are screen captures from Kibana showing the JVM heap utilization of the different data nodes in the cluster.

Looking at the GC graphs they seem fine, you have a very nice sawtooth that has good gaps - ie it's not every minute.

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/5.4/allocation-awareness.html, but it won't stop ES from putting multiple primaries on the same node.

Well that's my point really :slight_smile:
The GC graphs look fine now, but if we increase the load they will start being much more frequent on the nodes with two primary shards, because the load is not properly distributed.

I do wonder if this kind of load distribution is specific to the update use case (where primary shard does more work than replica)?

How does it do more?

The primary executes indeed the computation of the new version of the document and then send the result to the replica.

In that sense, primary shard does a bit more job than the replica I guess.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.