Increasing shard relocation speed

I have ES 2.3.3 running with 6 data nodes. Each index has one replica. ONE of the data nodes was down for a day. Now it's back to life and ES started relocating shards to it.

The thing is that it only relocates two shards at a time. Each nodes holds about 1TB of data and it looks like it will take many hours. How can I increase this number to speed the process up?

P.S. I've also set indices.recovery.max_bytes_per_sec to 200mb, though I see that java process on the recovering node writes only 70-80MB/s (and I've tested my disks to provide 200+ Mb/s).

Thanks!

You could try one of those settings: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/recovery.html#recovery

  • indices.recovery.concurrent_streams
  • indices.recovery.concurrent_small_file_streams
1 Like

concurrent_streams was already set to 5. Set concurrent_small_file_streams to 5 as well, but still, only two shards a time :frowning:

Have a look at the allocation decider cluster.routing.allocation.node_concurrent_recoveries:
How many concurrent shard recoveries are allowed to happen on a node. Defaults to 2.

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/shards-allocation.html

1 Like

Changed to 10. My cluster settings now look like the below. To test this, I evicted one node by excluding it through cluster.routing.allocation.exclude._ip. However still there were only 2 relocating shards in ES at a time.

I think this is because the settings you've mentioned relate to recovery, while what I'm experiencing is shard relocation. I.e. I'm joining new node to the cluster (to scale out) and only two shards at a time are being moved to it.

So my original question still stands - how to boost shard relocation speed?

My cluster settings:

{
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "node_concurrent_recoveries" : "10",
          "node_initial_primaries_recoveries" : "10"
        }
      }
    },
    "discovery" : {
      "zen" : {
        "minimum_master_nodes" : "2"
      }
    },
    "indices" : {
      "recovery" : {
        "concurrent_small_file_streams" : "5",
        "concurrent_streams" : "5",
        "max_bytes_per_sec" : "200mb"
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "all",
          "exclude" : {
            "_ip" : ""
          }
        }
      }
    }
  }
}

On the page I linked, you can find the setting cluster.routing.allocation.cluster_concurrent_rebalance

Allow to control how many concurrent shard rebalances are allowed cluster wide. Defaults to 2.

Relocations are considered as recoveries as well, so you should still increase cluster.routing.allocation.node_concurrent_recoveries if all relocations go to one node.

1 Like

Yeehaa! Set this to 10 and it did the trick! 10 relocating shards.

Thank you very much for your help.