We are using elasticsearch 1.7.4, cluster settings are following with around ~400 nodes evenly distributed across aws_availability_zone
cluster.routing.allocation.awareness.attributes: aws_availability_zone
cluster.routing.allocation.awareness.force.aws_availability_zone.values: us-east-1a,us-east-1d,us-east-1e
one of the index consist of 1 primary and 17 replica, so the expected behaviour is 6 shards per aws_availability_zone, but I see 3 shards unassigned.
When I try to manualy allocate the shard to hosts in those 3 aws_availability_zone, it says "too many shards on nodes for attribute: [aws_availability_zone]"
I verified using _cat/shards?index=my_index, 1a, 1d and 1e have 5 shards each. As per AwarenessAllocationDecider,
shardCount = 18 (17+1)
averagePerAttribute = 6 (18/3)
requiredCountPerAttribute = 6
leftoverPerAttribute = 0
currentNodeCount = 5
and (currentNodeCount > (requiredCountPerAttribute + leftoverPerAttribute)) -> false then why am I getting allocation decision NO
It will be great if someone can help me debug this !
Also the way we work around this problem is increase the replica count to large number (not losing quorum) 23, it will start assiging a batch of shards and come back to 17. I think due to this I observed another index with 5 shards in 1a, 5 shards in 1d, 7 shards in 1e and 1 unassigned shard, which is strange becasue it should rebalance and and not let more than 6 shards in aws_availability_zone.
It will be great if you can point me to any related issue