Could relocate speed faster than 96MB/s?

The cluster has 10 Gbit network and SSD RAID disks, but relocate speed only up to ~100MB/s.
I have try to set indices.recovery.max_bytes_per_sec to -1 but no no effect.
I found @ywelsch say "it achieved a throughput of 96MB/s" in benchmark.
What is the bottleneck, could we break the limitation?

Can you share more information about the configuration you're using (i.e. elasticsearch.yml and cluster settings)? Do you have compression or TLS enabled?

Also, are you measuring the relocation speed of a single shard? How do you measure this?

Thanks for reply.Here are configurations which are possible related:

disable compression and TLS.
cluster.routing.allocation.node_concurrent_recoveries: 4

I increase node_concurrent_recoveries to 16, then relocation speed up to 300MB/s(measuring by iptraf-ng). It's still too slow for NIC/SSD, meanwhile CPU usage is 10+%. load is 3(32 cores CPU).
Maybe relocating hundreds shards parallelly could reach a higher speed.
I wondering why a single shard relcation can not fully use hardware resources.Is there other configuration would effect the speed of relocation?

What's the latency between the nodes? Also, which ES version are you using? If you're using 6.7+, you can start experimenting with the indices.recovery.max_concurrent_file_chunks setting (see https://www.elastic.co/guide/en/elasticsearch/reference/6.7/recovery.html), and setting that to 5 (i.e. the maximum allowed).

Three nodes in local area network, so the ping latency about 0.1ms. I run a network speedtest by iperf, it can reach 9Gbits/s, so network is not problem.
I try to set max_concurrent_file_chunks to 5, but it seems no effect.
3 shards relcation -> 128M/s
5 shards relcation -> 200M/s (2 new shards after set max_concurrent_file_chunks, looks like 40MB per shard?)
Can you provide your maximum relocation speed of a single shard in benchmark environment? Thanks.

We'll have to rerun some benchmarks, which might take a while.

@LoadingZhang can you share the recovery stats which can be retrieved at /_recovery?

Sorry for late, here is one of large index recovery stats:

{
  "shards": [
    {
      "id": 0,
      "type": "PEER",
      "stage": "DONE",
      "primary": false,
      "start_time": "2019-04-05T08:18:17.846Z",
      "start_time_in_millis": 1554452297846,
      "stop_time": "2019-04-05T09:52:13.882Z",
      "stop_time_in_millis": 1554457933882,
      "total_time": "1.5h",
      "total_time_in_millis": 5636035,
      "source": {
        "name": "node1"
      },
      "target": {
        "name": "node3"
      },
      "index": {
        "size": {
          "total": "137.8gb",
          "total_in_bytes": 147990021833,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "137.8gb",
          "recovered_in_bytes": 147990021833,
          "percent": "100.0%"
        },
        "files": {
          "total": 45,
          "reused": 0,
          "recovered": 45,
          "percent": "100.0%"
        },
        "total_time": "1.5h",
        "total_time_in_millis": 5634494,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "1.5s",
        "total_time_in_millis": 1535
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    },
    {
      "id": 2,
      "type": "PEER",
      "stage": "DONE",
      "primary": false,
      "start_time": "2019-04-05T08:18:17.838Z",
      "start_time_in_millis": 1554452297838,
      "stop_time": "2019-04-05T09:58:54.275Z",
      "stop_time_in_millis": 1554458334275,
      "total_time": "1.6h",
      "total_time_in_millis": 6036437,
      "source": {
        "name": "node3"
      },
      "target": {
        "name": "node2"
      },
      "index": {
        "size": {
          "total": "138.5gb",
          "total_in_bytes": 148721148140,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "138.5gb",
          "recovered_in_bytes": 148721148140,
          "percent": "100.0%"
        },
        "files": {
          "total": 45,
          "reused": 0,
          "recovered": 45,
          "percent": "100.0%"
        },
        "total_time": "1.6h",
        "total_time_in_millis": 6034787,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "1.1s",
        "total_time_in_millis": 1139
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    },
    {
      "id": 1,
      "type": "PEER",
      "stage": "DONE",
      "primary": true,
      "start_time": "2019-04-05T02:39:12.697Z",
      "start_time_in_millis": 1554431952697,
      "stop_time": "2019-04-05T04:08:55.155Z",
      "stop_time_in_millis": 1554437335155,
      "total_time": "1.4h",
      "total_time_in_millis": 5382458,
      "source": {
        "name": "node3"
      },
      "target": {
        "name": "node2"
      },
      "index": {
        "size": {
          "total": "137.7gb",
          "total_in_bytes": 147891448846,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "137.7gb",
          "recovered_in_bytes": 147891448846,
          "percent": "100.0%"
        },
        "files": {
          "total": 45,
          "reused": 0,
          "recovered": 45,
          "percent": "100.0%"
        },
        "total_time": "1.4h",
        "total_time_in_millis": 5381237,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "715ms",
        "total_time_in_millis": 715
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    },
    {
      "id": 2,
      "type": "PEER",
      "stage": "DONE",
      "primary": true,
      "start_time": "2019-04-07T04:05:31.405Z",
      "start_time_in_millis": 1554609931405,
      "stop_time": "2019-04-07T04:43:18.150Z",
      "stop_time_in_millis": 1554612198150,
      "total_time": "37.7m",
      "total_time_in_millis": 2266745,
      "source": {
        "name": "node3"
      },
      "target": {
        "name": "node1"
      },
      "index": {
        "size": {
          "total": "180.6gb",
          "total_in_bytes": 193976351223,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "180.6gb",
          "recovered_in_bytes": 193976351223,
          "percent": "100.0%"
        },
        "files": {
          "total": 12,
          "reused": 0,
          "recovered": 12,
          "percent": "100.0%"
        },
        "total_time": "37.7m",
        "total_time_in_millis": 2264071,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "2.1s",
        "total_time_in_millis": 2166
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    }
  ]
}

@LoadingZhang Thanks for providing the stats.

I ran a recovery benchmark with GCP while I was working on https://github.com/elastic/elasticsearch/pull/36981. The recovery throughput was ~190MB which was bounded by the disk (local SSD) write throughput. What hardware are you using? Can you measure your disk read/write throughput? Thank you!

I run a benchmark by using fio, seq-read and seq-write throughput is 3494MB/2179MB per second, and random read&write upto 434MB/s. I create physical RAID0 for SSD, the %util is less then 10% when ES is doing relocating. So i think disk is not bottleneck.
I don't know are there other factors would slow down relocate speed, such as best_compression, etc.

best_compression should not affect the throughput since peer recovery just copies segment files.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.