We want to make the shards awareness of the infrastructure topology.
We have racks, and within those racks, we have zones. A zone consists of multiple physical machines. A physical machine has multiple virtual machines.
I have been experimenting with having multiple awareness attributes, but these do not seem to be working. It is working correctly if there is 1 awareness attribute (e.g. rack).
Allocation awareness is not hierarchical. What's happening here is that the rack constraint ensures that shard copies are spread among the different racks. The zone constraint tries to ensure that shard copies are spread among the zones, which is however independent of the racks.
Let me illustrate with an example (index with 1 primary shard and 4 replicas - only 3 racks and 2 zones for simplicity):
Rack1 Rack2 Rack3
Z1 Z2 Z1 Z2 Z1 Z2
P RR RR
Rack 1 has 1 shard and racks 2 and 3 have 2 shards, so the racks are balanced.
Zone 1 has 3 shards and zone 2 has 2 shards, so the zones are balanced as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.