Could relocate speed faster than 96MB/s?

LoadingZhang · April 5, 2019, 2:21am

The cluster has 10 Gbit network and SSD RAID disks, but relocate speed only up to ~100MB/s.
I have try to set indices.recovery.max_bytes_per_sec to -1 but no no effect.
I found @ywelsch say "it achieved a throughput of 96MB/s" in benchmark.
What is the bottleneck, could we break the limitation?

ywelsch · April 5, 2019, 6:58am

Can you share more information about the configuration you're using (i.e. elasticsearch.yml and cluster settings)? Do you have compression or TLS enabled?

Also, are you measuring the relocation speed of a single shard? How do you measure this?

LoadingZhang · April 5, 2019, 9:32am

Thanks for reply.Here are configurations which are possible related:

disable compression and TLS.
cluster.routing.allocation.node_concurrent_recoveries: 4

I increase node_concurrent_recoveries to 16, then relocation speed up to 300MB/s(measuring by iptraf-ng). It's still too slow for NIC/SSD, meanwhile CPU usage is 10+%. load is 3(32 cores CPU).
Maybe relocating hundreds shards parallelly could reach a higher speed.
I wondering why a single shard relcation can not fully use hardware resources.Is there other configuration would effect the speed of relocation?

ywelsch · April 5, 2019, 10:00am

What's the latency between the nodes? Also, which ES version are you using? If you're using 6.7+, you can start experimenting with the indices.recovery.max_concurrent_file_chunks setting (see https://www.elastic.co/guide/en/elasticsearch/reference/6.7/recovery.html), and setting that to 5 (i.e. the maximum allowed).

LoadingZhang · April 5, 2019, 10:41am

Three nodes in local area network, so the ping latency about 0.1ms. I run a network speedtest by iperf, it can reach 9Gbits/s, so network is not problem.
I try to set max_concurrent_file_chunks to 5, but it seems no effect.
3 shards relcation -> 128M/s
5 shards relcation -> 200M/s (2 new shards after set max_concurrent_file_chunks, looks like 40MB per shard?)
Can you provide your maximum relocation speed of a single shard in benchmark environment? Thanks.

ywelsch · April 5, 2019, 1:56pm

We'll have to rerun some benchmarks, which might take a while.

nhat · April 6, 2019, 1:09am

@LoadingZhang can you share the recovery stats which can be retrieved at /_recovery?

LoadingZhang · April 7, 2019, 4:46am

Sorry for late, here is one of large index recovery stats:

{
  "shards": [
    {
      "id": 0,
      "type": "PEER",
      "stage": "DONE",
      "primary": false,
      "start_time": "2019-04-05T08:18:17.846Z",
      "start_time_in_millis": 1554452297846,
      "stop_time": "2019-04-05T09:52:13.882Z",
      "stop_time_in_millis": 1554457933882,
      "total_time": "1.5h",
      "total_time_in_millis": 5636035,
      "source": {
        "name": "node1"
      },
      "target": {
        "name": "node3"
      },
      "index": {
        "size": {
          "total": "137.8gb",
          "total_in_bytes": 147990021833,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "137.8gb",
          "recovered_in_bytes": 147990021833,
          "percent": "100.0%"
        },
        "files": {
          "total": 45,
          "reused": 0,
          "recovered": 45,
          "percent": "100.0%"
        },
        "total_time": "1.5h",
        "total_time_in_millis": 5634494,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "1.5s",
        "total_time_in_millis": 1535
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    },
    {
      "id": 2,
      "type": "PEER",
      "stage": "DONE",
      "primary": false,
      "start_time": "2019-04-05T08:18:17.838Z",
      "start_time_in_millis": 1554452297838,
      "stop_time": "2019-04-05T09:58:54.275Z",
      "stop_time_in_millis": 1554458334275,
      "total_time": "1.6h",
      "total_time_in_millis": 6036437,
      "source": {
        "name": "node3"
      },
      "target": {
        "name": "node2"
      },
      "index": {
        "size": {
          "total": "138.5gb",
          "total_in_bytes": 148721148140,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "138.5gb",
          "recovered_in_bytes": 148721148140,
          "percent": "100.0%"
        },
        "files": {
          "total": 45,
          "reused": 0,
          "recovered": 45,
          "percent": "100.0%"
        },
        "total_time": "1.6h",
        "total_time_in_millis": 6034787,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "1.1s",
        "total_time_in_millis": 1139
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    },
    {
      "id": 1,
      "type": "PEER",
      "stage": "DONE",
      "primary": true,
      "start_time": "2019-04-05T02:39:12.697Z",
      "start_time_in_millis": 1554431952697,
      "stop_time": "2019-04-05T04:08:55.155Z",
      "stop_time_in_millis": 1554437335155,
      "total_time": "1.4h",
      "total_time_in_millis": 5382458,
      "source": {
        "name": "node3"
      },
      "target": {
        "name": "node2"
      },
      "index": {
        "size": {
          "total": "137.7gb",
          "total_in_bytes": 147891448846,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "137.7gb",
          "recovered_in_bytes": 147891448846,
          "percent": "100.0%"
        },
        "files": {
          "total": 45,
          "reused": 0,
          "recovered": 45,
          "percent": "100.0%"
        },
        "total_time": "1.4h",
        "total_time_in_millis": 5381237,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "715ms",
        "total_time_in_millis": 715
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    },
    {
      "id": 2,
      "type": "PEER",
      "stage": "DONE",
      "primary": true,
      "start_time": "2019-04-07T04:05:31.405Z",
      "start_time_in_millis": 1554609931405,
      "stop_time": "2019-04-07T04:43:18.150Z",
      "stop_time_in_millis": 1554612198150,
      "total_time": "37.7m",
      "total_time_in_millis": 2266745,
      "source": {
        "name": "node3"
      },
      "target": {
        "name": "node1"
      },
      "index": {
        "size": {
          "total": "180.6gb",
          "total_in_bytes": 193976351223,
          "reused": "0b",
          "reused_in_bytes": 0,
          "recovered": "180.6gb",
          "recovered_in_bytes": 193976351223,
          "percent": "100.0%"
        },
        "files": {
          "total": 12,
          "reused": 0,
          "recovered": 12,
          "percent": "100.0%"
        },
        "total_time": "37.7m",
        "total_time_in_millis": 2264071,
        "source_throttle_time": "0s",
        "source_throttle_time_in_millis": 0,
        "target_throttle_time": "-1",
        "target_throttle_time_in_millis": 0
      },
      "translog": {
        "recovered": 0,
        "total": 0,
        "percent": "100.0%",
        "total_on_start": 0,
        "total_time": "2.1s",
        "total_time_in_millis": 2166
      },
      "verify_index": {
        "check_index_time": "0s",
        "check_index_time_in_millis": 0,
        "total_time": "0s",
        "total_time_in_millis": 0
      }
    }
  ]
}

nhat · April 9, 2019, 1:58am

@LoadingZhang Thanks for providing the stats.

I ran a recovery benchmark with GCP while I was working on https://github.com/elastic/elasticsearch/pull/36981. The recovery throughput was ~190MB which was bounded by the disk (local SSD) write throughput. What hardware are you using? Can you measure your disk read/write throughput? Thank you!

LoadingZhang · April 11, 2019, 1:00pm

I run a benchmark by using fio, seq-read and seq-write throughput is 3494MB/2179MB per second, and random read&write upto 434MB/s. I create physical RAID0 for SSD, the %util is less then 10% when ES is doing relocating. So i think disk is not bottleneck.
I don't know are there other factors would slow down relocate speed, such as best_compression, etc.

nhat · April 11, 2019, 2:32pm

best_compression should not affect the throughput since peer recovery just copies segment files.

system · May 9, 2019, 2:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increasing shard relocation speed Elasticsearch	7	28698	July 5, 2017
Transport.tcp.compress slowing down shard relocation Elasticsearch	3	1120	October 31, 2018
Moving shards is slow Elasticsearch	15	5285	May 10, 2018
Could the unavailability of replicas be the reason for slow shard recovery/relocation processes? Elasticsearch	1	337	August 7, 2019
Network Speed between ES nodes Elasticsearch	2	1583	July 6, 2017

Could relocate speed faster than 96MB/s?

Related topics