Restarts take forever, even with shard allocation disabled, when node hasn't been restarted recently


(Tim J) #1

Hey folks,
Not a bug, but something I'd definitely like to understand. I've noticed
that when I restart nodes, even with shard allocation disabled, the amount
of time it takes to initialize all shards again seems to be somehow related
to the amount of time since the last restart. The more time since the last
restart, the longer it takes to initialize all shards. What's going on?
Is there some log that's being checked? Is there any way to make this
process faster by doing something before restarting? If it makes a
difference I'm on a 15 node cluster with ~2k (3 replicas) with between 2k
and 20m docs per shard. Restarts are done with the service wrapper.

Thanks!
-Tim

--


(Otis Gospodnetić) #2

Hi,

Please check the ML archives - I asked a similar question several months
ago. Nothing related to time between restarts, but related to preventing
ES from moving nodes around after restart. You can tell it not to do that
and avoid having nodes being shuffled around, which will speed things up.
You don't mention shard shuffling, I'm just assuming that's what's going
on. If you grab SPM for ES you'll see a timeseries graph that shows this
sort of stuff, so you will see how many shards are in what state and for
how long, which may be enlightening here.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html

On Thursday, October 11, 2012 3:21:10 PM UTC-4, Tim J wrote:

Hey folks,
Not a bug, but something I'd definitely like to understand. I've
noticed that when I restart nodes, even with shard allocation disabled, the
amount of time it takes to initialize all shards again seems to be somehow
related to the amount of time since the last restart. The more time since
the last restart, the longer it takes to initialize all shards. What's
going on? Is there some log that's being checked? Is there any way to
make this process faster by doing something before restarting? If it makes
a difference I'm on a 15 node cluster with ~2k (3 replicas) with between 2k
and 20m docs per shard. Restarts are done with the service wrapper.

Thanks!
-Tim

--


(Clinton Gormley) #3

Hiya

Nothing related to time between restarts, but related to preventing
ES from moving nodes around after restart. You can tell it not to do
that and avoid having nodes being shuffled around, which will speed
things up. You don't mention shard shuffling, I'm just assuming
that's what's going on.

Actually, he did mention disabling shard allocation.

Tim, yes - the restarted nodes are copying across newer segments from
the running nodes. So the longer the time between shutdown and restart,
the more segment copying has to take place.

clint

--


(system) #4