Zone-A and Zone-B are two different datacenters.
Node attributes (GET _cat/nodeattrs?v&s=node output) is like this .
Yes we are using allocation awareness
Elasticsearch.yml configuration has the following parameters depending on the zone and type of node (hot/warm)
node.attr.mode: data_node
node.attr.zone: "zone-A"
node.attr.temp: "hot"
The current cluster settings are like this:
{
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"awareness" : {
"attributes" : "zone",
"force" : {
"zone" : {
"values" : [
"zone-A",
"zone-B"
]
}
}
},
"disk" : {
"watermark" : {
"low" : "1200gb",
"flood_stage" : "150gb",
"high" : "1200gb"
}
}
}
},
"info" : {
"update" : {
"interval" : "60s"
}
}
},
"indices" : {
"recovery" : {
"max_bytes_per_sec" : "400mb"
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"node_concurrent_incoming_recoveries" : "1",
"cluster_concurrent_rebalance" : "2",
"node_concurrent_recoveries" : "1"
}
}
},
"indices" : {
"recovery" : {
"max_bytes_per_sec" : "40mb"
}
}
}
}
Also, in the daily indices, we have the following settings, which will create new indices only in hot nodes. Later after 90 days we are moving them to warm nodes.
{
"time-data-2012.12.13" : {
"settings" : {
"index" : {
"routing" : {
"allocation" : {
"require" : {
"temp" : "hot"
},
"total_shards_per_node" : "3"
}
},
"mapping" : {
"nested_fields" : {
"limit" : "1000"
},
"total_fields" : {
"limit" : "10000"
}
},
"refresh_interval" : "55s",
"number_of_shards" : "24",
"translog" : {
"sync_interval" : "25s",
"durability" : "async"
},
"provided_name" : "time-data-2012.12.13",
"merge" : {
"scheduler" : {
"max_thread_count" : "3"
}
},
"unassigned" : {
"node_left" : {
"delayed_timeout" : "30m"
}
},
"number_of_replicas" : "2"
}
}
}
}
And to give you an idea about the shard allocation happening in nodes, here is the output of GET _cat/allocation?v&s=node
API .
Now daily we are manually re-assigning the hot shards from zone-B to zone-A, as zone-B is getting most of the primaries and thus causing load and subsequent indexing problems
How to make the cluster rebalance itself?.