Time to move an elastic node is quiet long

(Johnny BARRAY) #1

Hi ECE Team,

Still testing ECE, I try to migrate a node from an allocator to another.

My elastic node is empty (except monitoring datas) and was using 89 MB of 96GB before moving.

The overall task took 3 minutes and 38 secondes (from 08:13:00 to 08:16:38).
The longest task was :
2017-06-21T08:13:47.396Z Starting step: [migrate-data]: fromInstances=List(instance-0000000002)
Completed step: [migrate-data] with result: ()
This step took 2m 3s.

My ECE run on 4 physical boxes. What are the expected time for such operation ? Is there any throttle that limit the migration of data ? How to adjust this ?

Thanks for your feedback.


(Alex Piggott) #2

I normally count on a couple of minutes + data transfer times (max ~50MB/s but slower lots of small shards) + snapshot times if used (can take several minutes on larger clusters)

So your times seem pretty standard (I just moved a 1GB node with a few days of monitoring on my slowish setup and the migrate step took 5m). You can use the cat/recovery endpoint to follow what's going on (and in fact the overview page of the UI displays that in graphical form)

There's a few things going on:

  • The main thing is that moving nodes has a longish constant set up time regardless of size, but on typical clusters with 10s or 100s of GB (per shard) that time is not a dominating factor
  • The larger the cluster, the more CPU it has assigned to it, and the faster the migrations will occur. (You can manually turn the CPU metering on/off for a given cluster from the advanced cluster configuration, resources.cpu.hard_limit - that's often something we do to migrate data off small and overloaded clusters)
  • Also if you have HA clusters then migration is faster because the data is already available on another node (ie from the other replica).
  • Having physical machines with SSDs does make a decent difference I think - on our SaaS infrastructure, I'd expect the test migration I did to take ~1 minute or less.


(Johnny BARRAY) #3

OK, Thanks for thoses explanations.

I understood my test is not representative according to my cluster size... Which is a good news.


(system) #4

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.