Shard and Zone awareness


(Ranjith M) #1

We have build a new cluster with rack and zone awareness, and in our testing we have found shards are being distributed on rack properly, but we have issues with distributing shards based on zones

Settings in elasticsearch.yml

node.attr.rack_id: az2
node.attr.zone: cap2

PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.awareness.attributes": "rack_id,zone"
}
}

And the behavior is

Created index with 2 replicas and 5 shards, and here is how it distributed .
Index1 created primary and replica shard

Node1 Node2 Node3 Node4 Node5
Rack1-Zone1 Rack1-Zone2 Rack2-Zone1 Rack2-Zone2 Rack2-Zone2
Shard0(R) Shard0(R) Shard0(P)

Any help will be grateful


(Christian Dahlqvist) #2

I assume you want to make sure shards are distributed across different rack_id in order to make sure a failing rack can not affect all replicas of a shard. For this shard allocations awareness is the correct mechanism to use.

For the zone parameter I suspect you however want all shard in a single zone, so here you should instead use shard allocation filtering rather than awareness.


(Ranjith M) #3

Hi Christian,

Thank you for quick reply, I have read shard allocation filtering here is what we are looking for

Some mechanism to make sure every shard is in different rack and then with in the rack it should be in different zone (2 shards shouldn't be on the same zone) , this helps when one zone is down we still have a copy of that in different zone.


(Christian Dahlqvist) #4

Can you show how the nodes in the cluster are configured and how the shards were distributed?


#5

Hi Christian, I work with Ranjith.

The diagram depicts the behaviour within our cluster, as opposed to what we're expecting.


(Christian Dahlqvist) #6

Just to make sure that the nodes are tagged correctly, could you run this: GET _nodes/_all/host?pretty ?


#7

ok, so I've stripped anything sensitive:

"nodes": {
"name": "aaaa41",
"version": "5.6.3",
"roles": [
"data",
"ingest"
],
"attributes": {
"ml.max_open_jobs": "10",
"rack_id": "az2",
"ml.enabled": "true",
"zone": "zone2"
}
},
"name": "aaaa40",
"version": "5.6.3",
"roles": [
"master",
"data"
],
"attributes": {
"ml.max_open_jobs": "10",
"rack_id": "az1",
"ml.enabled": "true",
"zone": "zone1"
}
},
"name": "aaaaa42",
"version": "5.6.3",
"roles": [
"data",
"ingest"
],
"attributes": {
"ml.max_open_jobs": "10",
"rack_id": "az2",
"ml.enabled": "true",
"zone": "zone1"
}
},
"name": "aaaaa39",
"version": "5.6.3",
"roles": [
"master",
"data"
],
"attributes": {
"ml.max_open_jobs": "10",
"rack_id": "az1",
"ml.enabled": "true",
"zone": "zone2"
}
},
"name": "aaaaa43",
"version": "5.6.3",
"roles": [
"master",
"data"
],
"attributes": {
"ml.max_open_jobs": "10",
"rack_id": "az2",
"ml.enabled": "true",
"zone": "zone2"

  }

(Christian Dahlqvist) #8

So you have 5 nodes, where aaaaa41 and aaaaa43 have the same attributes. Was it these 2 nodes that both held a replica?


(Ranjith M) #9

Hi Christian,

Yes these are two nodes that are having same replicas


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.