Good afternoon.
Current cnfiguration:
ES version 6.8.8 in docker "docker.elastic.co/elasticsearch/elasticsearch:6.8.8"
heap_size: 31g
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 1253,
"active_shards" : 2232
Avr size per index ~28Gb.
Curator version 5.8.4
We use ES curator to shrink old indices, instead "3 primary 1 replica" curator does "1 primary 1 replica".
And we got some problems here
i.e. curator creates copy of old index with suffix "-shrink" then creates primary shard and succesfully allocates him, but when he tries allocate replica shard we've got this error:
{
"index" : "example-index-2021-09-29-shrink",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-11-23T12:26:19.515Z",
"failed_allocation_attempts" : 1,
"details" : "failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "8r_zhRD4RDm2peWnDun_3w",
"node_name" : "node13",
"transport_address" : "192.168.0.162:9300",
"node_attributes" : {
"ml.machine_memory" : "135291469824",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [1] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-11-23T12:26:19.515Z], failed_attempts[1], delayed=false, details[failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ], allocation_status[no_attempt]]]"
}
]
I've tried to create a template like this:
"shrink" : {
"order" : 0,
"index_patterns" : [
"*-shrink"
],
"settings" : {
"index" : {
"allocation" : {
"max_retries" : "5"
}
}
But it doesn't work... Here are indices settings after successful shrink.
GET /example-index-shrink/_settings
{
"example-index-shrink" : {
"settings" : {
"index" : {
"allocation" : {
"max_retries" : "1"
},
"shrink" : {
"source" : {
"name" : "example-index",
"uuid" : "mecKKzDDTzu77ViMv5N3EA"
}
},
"blocks" : {
"write" : null
},
"provided_name" : "example-index-shrink",
"creation_date" : "1637751350836",
"number_of_replicas" : "1",
"uuid" : "MI_wbW35R8ubkYZOySfp1g",
"version" : {
"created" : "6080899",
"upgraded" : "6080899"
},
"codec" : "best_compression",
"routing" : {
"allocation" : {
"initial_recovery" : {
"_id" : "nWOPSov3TFKUunoiooVxMQ"
},
"require" : {
"_name" : null
}
}
},
"number_of_shards" : "1",
"routing_partition_size" : "1",
"resize" : {
"source" : {
"name" : "example-index",
"uuid" : "mecKKzDDTzu77ViMv5N3EA"
}
}
}
}
}
}
How I can change index.allocation.max_retries value for shrinking indices?
I can't see that settings in curator action file
Thanks in advance