Hello,
We have a 5-node ES 2.3.3 cluster that we use for a write-heavy logging workload. We keep 90 daily indices on it, each with the default settings of {"number_of_shards": 5, "number_of_replicas": 1}
. Each index contains about 80M documents and takes up about 70GB.
When new indices are created, they almost always have an uneven shard allocation. The following allocation is pretty typical for our 5 (P)rimaries and 5 (R)eplicas per index:
Node 1: P0 P1 P2 P3 P4
Node 2: R1
Node 3: R3
Node 4: R2
Node 5: R0 R4
It's worth noting that the exact allocation for Nodes 2-5 will vary a bit, but Node 1 almost always gets all 5 primary shards.
I understand that we don't need to worry about the uneven distribution of primaries, but we are seeing poor bulk indexing performance with the uneven default allocation. When we manually adjust the allocation to an even 2 shards per node--what we would expect by default--we have no indexing bottlenecks.
I'm wondering what we can do to automatically get an even per-index shard allocation. I've found some hacky approaches like initially applying a restrictive total_shards_per_node
when creating an index, then relaxing it later. Setting {"cluster.routing.allocation.balance.index": 1}
doesn't seem to affect initial allocations.
(I also tried creating empty indices with 1 replica and 10, 15, and 20 shards. In all cases, I ended up with 5 primaries on Node 1. This worked well for the 10-shard index, which was almost perfectly even, but resulted in very uneven allocations for the 15- and 20-shard indices. I also tried creating a 5-shard 0-replica index, and all 5 primaries were again on Node 1. I don't think any of this is particularly relevant to the solution of the current problem, but it does point to some affinity that Node 1 has for hosting 5 shards per index!)
Key settings are below. Happy to provide any other settings/details that would be helpful to see.
Thanks!
Cluster-wide /etc/elasticsearch/elasticsearch.yml
:
cluster.name: cluster_name
node.name: node_name
path.data: /path/to/elasticsearch/data0,/path/to/elasticsearch/data1
path.logs: /path/to/elasticsearch/log
network.bind_host: "0.0.0.0"
network.publish_host: _non_loopback_
discovery:
type: ec2
ec2:
groups: aws-security-group-name
script.engine.groovy.inline.aggs: on
GET /_cluster/settings
:
{
"persistent": {},
"transient": {
"cluster.routing.allocation.enable": "all"
"cluster.routing.allocation.balance.index" : "1",
"threadpool.bulk.queue_size": "100"
}
}