Elasticsearch 7.16 shard recovery slow

wangxiangyu · December 26, 2023, 11:59am

hi,

The elasticsearch cluster has 6 hot node and 4 cold node. One cold node is removed caused by hardware failure. So lots of missing replica shards( about 20TB) began to recover. But I found the recovery process was very slow and last for about 4 day.

Take one shard recovery as example: 49.9gb shard recovery lasted for more than 10 hours.

{ - 
  "id": 2,
  "type": "PEER",
  "stage": "INDEX",
  "primary": false,
  "start_time": "2023-12-26T00:59:05.021Z",
  "start_time_in_millis": 1703552345021,
  "total_time": "10.7h",
  "total_time_in_millis": 38671947,
  "source": { - 
    "id": "ixxUhwHFSqKjcVm_QkprxA",
    "host": "13.50.32.24",
    "transport_address": "13.50.32.24:9301",
    "ip": "13.50.32.24",
    "name": "node-2-cold"
  },
  "target": { - 
    "id": "rS7gKUTFRNyJfpRhnzjmCg",
    "host": "13.50.32.25",
    "transport_address": "13.50.32.25:9301",
    "ip": "13.50.32.25",
    "name": "node-3-cold"
  },
  "index": { - 
    "size": { - 
      "total": "49.9gb",
      "total_in_bytes": 53665251764,
      "reused": "0b",
      "reused_in_bytes": 0,
      "recovered": "22.9gb",
      "recovered_in_bytes": 24670228482,
      "recovered_from_snapshot": "0b",
      "recovered_from_snapshot_in_bytes": 0,
      "percent": "46.0%"
    },
    "files": { - 
      "total": 238,
      "reused": 0,
      "recovered": 221,
      "percent": "92.9%"
    },
    "total_time": "10.7h",
    "total_time_in_millis": 38662530,
    "source_throttle_time": "31s",
    "source_throttle_time_in_millis": 31097,
    "target_throttle_time": "30.7m",
    "target_throttle_time_in_millis": 1843266
  },
  "translog": { - 
    "recovered": 0,
    "total": 0,
    "percent": "100.0%",
    "total_on_start": 0,
    "total_time": "0s",
    "total_time_in_millis": 0
  },
  "verify_index": { - 
    "check_index_time": "0s",
    "check_index_time_in_millis": 0,
    "total_time": "0s",
    "total_time_in_millis": 0
  }
}

I tried to adjuest the following configurations but it didn't work.

"cluster.routing.allocation.node_concurrent_recoveries": 50
"indices.recovery.max_bytes_per_sec" : "1000mb"

No matter how I changed the settings, the recovery didn't speed up.

elasticsearch version 7.16, The cold node hardware information:

cpu: 64 core 
memory: 180GB 
storage: Raid0(7.3TB HDD * 10 )
network: 10GB*2

I am quite sure the host load is very low.

Any ideas? Thank you.

I saw a similar issue, Elasticsearch 6.3.0 shard recovery is slow, Setting transport.tcp.compress to false fixed the issue

DavidTurner · December 26, 2023, 4:12pm

I suspect this is the problem: HDDs are just not very fast. They're particularly bad at concurrent IO so node_concurrent_recoveries: 50 is going to make things a whole lot worse.

wangxiangyu · December 27, 2023, 5:55am

Thank you for your quick relay.

I updated the settings

"cluster.routing.allocation.node_concurrent_recoveries": 2
"indices.recovery.max_bytes_per_sec" : "1000mb"

but shard recovery was still slow.

{ - 
  "id": 0,
  "type": "PEER",
  "stage": "INDEX",
  "primary": true,
  "start_time": "2023-12-27T04:56:29.122Z",
  "start_time_in_millis": 1703652989122,
  "total_time": "37.6m",
  "total_time_in_millis": 2258017,
  "source": { - 
    "id": "Tlqnxqj4T9ey359n2jt4Bw",
    "host": "13.50.32.23",
    "transport_address": "13.50.32.23:9300",
    "ip": "13.50.32.23",
    "name": "node-1"
  },
  "target": { - 
    "id": "ixxUhwHFSqKjcVm_QkprxA",
    "host": "13.50.32.24",
    "transport_address": "13.50.32.24:9301",
    "ip": "13.50.32.24",
    "name": "node-2-cold"
  },
  "index": { - 
    "size": { - 
      "total": "51gb",
      "total_in_bytes": 54843923353,
      "reused": "0b",
      "reused_in_bytes": 0,
      "recovered": "50.4gb",
      "recovered_in_bytes": 54123145777,
      "recovered_from_snapshot": "0b",
      "recovered_from_snapshot_in_bytes": 0,
      "percent": "98.7%"
    },
    "files": { - 
      "total": 208,
      "reused": 0,
      "recovered": 207,
      "percent": "99.5%"
    },
    "total_time": "37.6m",
    "total_time_in_millis": 2257942,
    "source_throttle_time": "201.6ms",
    "source_throttle_time_in_millis": 201,
    "target_throttle_time": "1s",
    "target_throttle_time_in_millis": 1043
  },
  "translog": { - 
    "recovered": 0,
    "total": 0,
    "percent": "100.0%",
    "total_on_start": 0,
    "total_time": "0s",
    "total_time_in_millis": 0
  },
  "verify_index": { - 
    "check_index_time": "0s",
    "check_index_time_in_millis": 0,
    "total_time": "0s",
    "total_time_in_millis": 0
  }
}

recoverying of the shard of 50GB cost 37.6mins

Christian_Dahlqvist · December 27, 2023, 6:44am

What does await and disk utilisation look like on the cold nodes while recovery is ongoing?

wangxiangyu · December 27, 2023, 7:01am

snapshot of outputs of iostat when busiest, almost idle other times

sdk to sdj are hdd disks
md1 is raid0 of thest hdd disks.

wangxiangyu · January 2, 2024, 8:34am

hi,
I had found the root cause.

The bandwidth of transport network between nodes is 1GB, it reaches the bandwidth limit.

Sorry to interrupt.

DavidTurner · January 2, 2024, 8:51am

20TiB in 4 days is just 60MiB/s.

system · January 30, 2024, 8:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 6.3.0 shard recovery is slow Elasticsearch	14	6853	September 26, 2018
Restarting ES and slow recovery Elasticsearch	6	715	July 6, 2017
Elasticsearch quick recovery after restart Elasticsearch	3	518	July 6, 2017
Searchable Snapshot/Cold Nodes much slower to recover 8.5+? Elasticsearch elastic-stack-searchable-snapshots	2	387	March 25, 2023
Could the unavailability of replicas be the reason for slow shard recovery/relocation processes? Elasticsearch	1	337	August 7, 2019

Elasticsearch 7.16 shard recovery slow

Related topics