We want to make the shards awareness of the infrastructure topology.
We have racks, and within those racks, we have zones. A zone consists of multiple physical machines. A physical machine has multiple virtual machines.
I have been experimenting with having multiple awareness attributes, but these do not seem to be working. It is working correctly if there is 1 awareness attribute (e.g. rack).
An example in the elasticsearch.yml file:
How can you create and use multiple awareness attributes correctly?
Can you provide an example on how you would expect shards to be allocated and how they are actually allocated?
I have 3 racks. On each rack, there are 5 physical machines ("zones"). On each physical machine (= 1 zone), there are virtual machines.
I want to make sure the primary shards are at least in 1 rack.
I want to make sure the same primary and replica shards are not in the same zone.
Allocation awareness is not hierarchical. What's happening here is that the rack constraint ensures that shard copies are spread among the different racks. The zone constraint tries to ensure that shard copies are spread among the zones, which is however independent of the racks.
Let me illustrate with an example (index with 1 primary shard and 4 replicas - only 3 racks and 2 zones for simplicity):
Rack1 Rack2 Rack3
Z1 Z2 Z1 Z2 Z1 Z2
P RR RR
- Rack 1 has 1 shard and racks 2 and 3 have 2 shards, so the racks are balanced.
- Zone 1 has 3 shards and zone 2 has 2 shards, so the zones are balanced as well.
What you probably want is allocation awareness on the racks and something like
cluster.routing.allocation.same_shard.host constraint on the zones (see here: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/shards-allocation.html#_shard_allocation_settings).
Thank you for this clear explanation.
cluster.routing.allocation.same_shard.host works only when the nodes have the same hostname AND IP address.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.