Hi,
we have a cluster with this info:
{
"status": "yellow",
"timed_out": false,
"number_of_nodes": 59,
"number_of_data_nodes": 54,
"active_primary_shards": 6205,
"active_shards": 12409,
"relocating_shards": 7,
"initializing_shards": 0,
"unassigned_shards": 1,
"unassigned_primary_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 99.99194198227237
}
"routing": { "allocation": {"cluster_concurrent_rebalance": "2"
} },"max_shards_per_node": "3000"
},"search": {
"default_search_timeout": "5m",
"max_async_search_response_size": "50mb"
}},
"transient": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "5"
}
Hot nodes are 23 out of those 54 data nodes. Hot nodes are 2TB, cold nodes are 12TB.
The issue lately the cluster is in yellow state quite regularly, itâs keeping constantly rebalancing with 5 nodes, and the initalized nodes stays waiting for the rebalance to start, example:
"can_remain_on_current_node": "yes",
"can_rebalance_cluster": "throttled",
"can_rebalance_cluster_decisions": [
{"decider": "concurrent_rebalance",
"decision": "THROTTLE",
"explanation": "reached the limit of concurrently rebalancing shards [8], cluster setting [cluster.routing.allocation.cluster_concurrent_rebalance=5]"
}],
"can_rebalance_to_other_node": "throttled",
"rebalance_explanation": "Elasticsearch is currently busy with other activities. It will rebalance this shard when those activities finish. Please wait.",
"node_allocation_decisions": [
Disk usage for hot nodos is: 30% are above 90%, 60% are above 80%.
Not sure if this behaviour is because of our free space in hot nodes, or maybe we can change the current rebalance value to better fit our needs. AFAWe understand shouldnât the shard be idle waiting like this for such a long time > 30m
Thanks