I have a twenty-node cluster with the hot-cold pattern. I'm running ElasticSearch 6.2.3. Here is the breakdown:
- 12 Hot Nodes: 16GB memory each, 9GB for JVM heap
- 8 Cold Nodes: 12GB memory each, 9GB for JVM heap
I'm using the default circuit breaker settings. Lately, when elasticsearch relocates shards to the cold nodes, the circuit breaker has been tripping. This happens constantly. When I check /_cluster/allocation/explain
, this is an excerpt what I see:
$ curl http://192.168.254.240:9200/_cluster/allocation/explain?pretty
{
"index" : "rec-19-58150",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2018-03-27T18:23:12.250Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [65Cu-mwsQi6X_Zf3_JPEMw]: failed recovery, failure RecoveryFailedException[[rec-19-58150][0]: Recovery failed from {allsight-node-slow-5}{9BTs_Ix7S82lVIDoNCHDgg}{8XHHmeuhSRyTn2Gzj5g_-A}{192.168.254.224}{192.168.254.224:9300}{speed=slow} into {allsight-node-slow-4}{65Cu-mwsQi6X_Zf3_JPEMw}{uoQ-H5CyQxyTGAQKQjjkTw}{192.168.254.223}{192.168.254.223:9300}{speed=slow}]; nested: RemoteTransportException[[allsight-node-slow-5][192.168.254.224:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [6753420497/6.2gb], which is larger than the limit of [6752370688/6.2gb]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "1778qoEdSJeEqC-v5-797w",
"node_name" : "allsight-node-slow-6",
"transport_address" : "192.168.254.225:9300",
"node_attributes" : {
"speed" : "slow"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2018-03-27T18:23:12.250Z], failed_attempts[5], delayed=false, details[failed shard on node [65Cu-mwsQi6X_Zf3_JPEMw]: failed recovery, failure RecoveryFailedException[[rec-19-58150][0]: Recovery failed from {allsight-node-slow-5}{9BTs_Ix7S82lVIDoNCHDgg}{8XHHmeuhSRyTn2Gzj5g_-A}{192.168.254.224}{192.168.254.224:9300}{speed=slow} into {allsight-node-slow-4}{65Cu-mwsQi6X_Zf3_JPEMw}{uoQ-H5CyQxyTGAQKQjjkTw}{192.168.254.223}{192.168.254.223:9300}{speed=slow}]; nested: RemoteTransportException[[allsight-node-slow-5][192.168.254.224:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [6753420497/6.2gb], which is larger than the limit of [6752370688/6.2gb]]; ], allocation_status[no_attempt]]]"
},
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},
...
I don't understand why the circuit breaker would even trip for relocations. It should just be copying segments over the network, which can be done in constant memory. I would appreciate any help resolving this. Let me know if there is more information I need to provide.