New replica are not getting assigned

I use kopf for visualization, and I tried changing the number of replica setting for an index from 17 to 20, making the replication group of 21 using kopf. (total 20 primary shards + 3 availability zones)
The observation is, it assigns some number of replica's (mainly group of 60) but others stay unassigned. Any pointers on debugging this issue ?

Elasticsearch version 1.7

You can try Explain API, which can tell you why it could not assign some shards.

I don't think this API is available in 1.7. Any other way to figureout the issue ?

To share more information, it mainly fails on "too many shards on nodes for attribute: [aws_availability_zone]" but I can clearly see there are many hosts in availability zone without the shards. The hack that I use to force allocate is, I increase the number of shards to relatively high value and cluster picks bunch of shards for allocation, I then reduce the replicas back to what is required and this solves the problem of unassigned shards. The process is very annoying when you have many indices :confused:

Why do you have so many replicas?

becasue we need more replicas for a business requirement, is there an issue with more number of replicas in 1.7 ?

No, it's just unusual to see many people running that many :slight_smile:

We have following settings for force awareness
cluster.routing.allocation.awareness.force.availability_zone.values: zone1, zone2, zone3
cluster.routing.allocation.awareness.attributes: availability_zone

This will get applied to assigning and relocating shards, but will it include new replicas added to the replication group while calculating "shardPerAttribute"?

adding a little more analysis,
with 11 replica + 20 primary = total 240 shards
Zone 1 - 73 assigned / 7 unassigned
Zone 2 - 65 assigned / 15 unassigned
Zone 3 - 78 assigned / 2 unassigned
I tried manually rerouting one of the shards to host in each zone
NO(too many shards on nodes for attribute: [availability_zone]

I also ran reroute with explain to get unassigned_info, "reason": "NODE_LEFT" which is expected.

I am curious, what happens with the number of hosts in all these zones are not equal? will that create any imbalance in assigning shards ? our index setting for "total_shards_per_node" is default

I am a bit confused. How many indices do these 240 shards belong to? How many primary shards do each index have? What is the number of replicas set to for these indices? How many data nodes do you have per availability zone?

This is the analysis of 1 index with 20 primary and 11 replica and around 80 data nodes in each zone. we do have more indices other than the mentioned.

If I have understood your configuration correctly, I would expect Elasticsearch to only assign the primary shard and 2 replicas of that shard as you have defined awareness of 3 zones. That is the purpose of shard allocation awareness according to the docs. Do all nodes have the allocation awareness parameters configured?

Yes all nodes have allocation awareness parameter set, I verified that.
Are you saying with 3 zones, we can only have 1primary + 2 replica setting? what if we have more replicas, what is the expected behavior ?

I would expect those replicas to be unassigned. If you wanted to have 5 replicas (6 copies of each shard), you could divide up a zone into parts, e.g. zone1a, zone1b, zone2a, zone2b, zone3a and zone3b. If you leave out or alter the forced allocation parameter, Elasticsearch will try to allocate one shard per zone and will now be able to place one primary shard and 5 replicas. This is quite well explained in the example given here.

That makes sense, Ill have to revist our Elasticsearch awareness settings.

I am curious, with current force settings if I look at replication group of shard 3, it shoud have 9 unassigned shards but all are assigned evenly accross zones. Which is strange and not an expected behavior !

That is what surprised me too, and why I asked if all nodes have all parameters correctly set. It does sound strange.

Am curious where this error comes from, given that you have specified the allocation awareness attribute as just availability_zone. Is there a mismatch in the configuration?

As per this conversation, I understood that removing "cluster.routing.allocation.awareness.force.availability_zone.values: zone1, zone2, zone3" might resolve this issue. Ill quickly test it as I dont see the need of this setting for now, because we need more than 2 replicas for sure.

Sorry about that, there is no missmatch. The attribute is aws_availability_zone. And I observed this error when I run reroute for a perticular shard