Unexpected and uneven per-index shard allocation

schwartzie · April 4, 2017, 9:14pm

Hello,

We have a 5-node ES 2.3.3 cluster that we use for a write-heavy logging workload. We keep 90 daily indices on it, each with the default settings of {"number_of_shards": 5, "number_of_replicas": 1}. Each index contains about 80M documents and takes up about 70GB.

When new indices are created, they almost always have an uneven shard allocation. The following allocation is pretty typical for our 5 (P)rimaries and 5 (R)eplicas per index:

Node 1: P0 P1 P2 P3 P4
Node 2: R1
Node 3: R3
Node 4: R2
Node 5: R0 R4

It's worth noting that the exact allocation for Nodes 2-5 will vary a bit, but Node 1 almost always gets all 5 primary shards.

I understand that we don't need to worry about the uneven distribution of primaries, but we are seeing poor bulk indexing performance with the uneven default allocation. When we manually adjust the allocation to an even 2 shards per node--what we would expect by default--we have no indexing bottlenecks.

I'm wondering what we can do to automatically get an even per-index shard allocation. I've found some hacky approaches like initially applying a restrictive total_shards_per_node when creating an index, then relaxing it later. Setting {"cluster.routing.allocation.balance.index": 1} doesn't seem to affect initial allocations.

(I also tried creating empty indices with 1 replica and 10, 15, and 20 shards. In all cases, I ended up with 5 primaries on Node 1. This worked well for the 10-shard index, which was almost perfectly even, but resulted in very uneven allocations for the 15- and 20-shard indices. I also tried creating a 5-shard 0-replica index, and all 5 primaries were again on Node 1. I don't think any of this is particularly relevant to the solution of the current problem, but it does point to some affinity that Node 1 has for hosting 5 shards per index!)

Key settings are below. Happy to provide any other settings/details that would be helpful to see.

Thanks!

Cluster-wide /etc/elasticsearch/elasticsearch.yml:

cluster.name: cluster_name
node.name: node_name
path.data: /path/to/elasticsearch/data0,/path/to/elasticsearch/data1
path.logs: /path/to/elasticsearch/log
network.bind_host: "0.0.0.0"
network.publish_host: _non_loopback_
discovery:
  type: ec2
  ec2:
    groups: aws-security-group-name
script.engine.groovy.inline.aggs: on

GET /_cluster/settings:

{
  "persistent": {},
  "transient": {
    "cluster.routing.allocation.enable": "all"
    "cluster.routing.allocation.balance.index" : "1",
    "threadpool.bulk.queue_size": "100"
  }
}

jkuang · April 4, 2017, 11:50pm

Can you set

cluster.routing.rebalance.enable : all
cluster.routing.allocation.allow_rebalance : always

and see if that makes a difference?

jkuang · April 4, 2017, 11:51pm

Also, what is the disk storage capacity on all 5 of your nodes?

nik9000 · April 5, 2017, 1:59pm

total_shards_per_node is a powerful tool but you have to be careful with it. I'd only apply it to the index I'm currently writing and remove it after the index is doing being written.

The thing about Elasticsearch's allocation is that it thinks of each shard as equal. Along those lines, what are the total shard counts on your nodes? Do you have shards with vastly different sizes? That can sometimes come up and there aren't much better tools than total_shards_per_node.

schwartzie · April 5, 2017, 3:21pm

@jkuang, cluster.routing.rebalance.enable was defaulting to all and I set cluster.routing.allocation.allow_rebalance to always. I got the same uneven allocation when creating a test index with default settings, but seems like I might need to push some data into it to get allow_rebalance to become relevant.

Disk storage capacity by node:

Node 1: 2TB, 82% used
Node 2: 2TB, 75% used
Node 3: 2TB, 74% used
Node 4: 4TB, 40% used
Node 5: 4TB, 39% used

@nik9000, completely agreed on the risks of total_shards_per_node! I was just relaxing it after writing, but I think you're right that removing it entirely is the safer move.

You're on to something with the total shard counts across nodes:

Node 1: 359
Node 2: 555
Node 3: 554
Node 4: 554
Node 5: 555

The shard totals here are much larger than the 90 * 10 my original post suggests, as there are about 150 much smaller indices (cumulative data total of ~50GB, but each with 5 shards and 1 replica). If Elasticsearch sees all shards as equal, then this explains why we keep getting so many shards assigned to Node 1.

To help equalize shard counts and sizes as much as possible, seems I should limit the small indices to a single shard. What can I do within an index though to keep each shard approximately the same size, or is that less critical?

nik9000 · April 5, 2017, 3:48pm

Usually each shard in an index is going to be about the same size. Unless you are using _routing or _parent. Usually.

For small indices I'd indeed limit them to a single shard.

Are these time based indices? If so you should be able to make the change to their template to change the number of shards and slowly the cluster should even out. You could also manually _reindex the older ones but you'd have to deal with having two copies of the data for a while (one in the old index, one in the new). But you'd be able to work on that actively and that ought to help with the allocation. If you did that I'd prioritize indices that have many shards on nodes 4 and 5.

schwartzie · April 5, 2017, 11:46pm

@nik9000, all the shards in an index at least have equivalent document counts when their on-disk sizes are different, so that makes sense.

The majority of the small indices are time-based, so I've updated their template settings to limit them to a single shard, and they'll eventually cause less of an imbalance. There might be a few to _reindex, but for the most part I think I'll just wait it out.

Until the per-node shard counts reach a natural equilibrium, I'll continue to use total_shards_per_node to avoid bottlenecks during bulk indexing.

Thanks for your help here--feeling much more confident in my understanding of the allocation dynamics!

nik9000 · April 6, 2017, 12:12am

You're quite welcome! Have a good night/whatever time it is for you now.

system · May 4, 2017, 12:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unexpected shard allocation with 0 replica indexes Elasticsearch	3	421	July 6, 2017
Balancing shards equally per nodes Elasticsearch	3	4798	July 5, 2017
Debugging Shard Allocation Elasticsearch	6	1837	July 6, 2017
Rebalancing a cluster Elasticsearch	6	8541	July 6, 2017
Re-balancing shard allocation Elasticsearch	21	894	June 20, 2018

Unexpected and uneven per-index shard allocation

Related topics