Hey folks,
Not a bug, but something I'd definitely like to understand. I've noticed
that when I restart nodes, even with shard allocation disabled, the amount
of time it takes to initialize all shards again seems to be somehow related
to the amount of time since the last restart. The more time since the last
restart, the longer it takes to initialize all shards. What's going on?
Is there some log that's being checked? Is there any way to make this
process faster by doing something before restarting? If it makes a
difference I'm on a 15 node cluster with ~2k (3 replicas) with between 2k
and 20m docs per shard. Restarts are done with the service wrapper.
Please check the ML archives - I asked a similar question several months
ago. Nothing related to time between restarts, but related to preventing
ES from moving nodes around after restart. You can tell it not to do that
and avoid having nodes being shuffled around, which will speed things up.
You don't mention shard shuffling, I'm just assuming that's what's going
on. If you grab SPM for ES you'll see a timeseries graph that shows this
sort of stuff, so you will see how many shards are in what state and for
how long, which may be enlightening here.
On Thursday, October 11, 2012 3:21:10 PM UTC-4, Tim J wrote:
Hey folks,
Not a bug, but something I'd definitely like to understand. I've
noticed that when I restart nodes, even with shard allocation disabled, the
amount of time it takes to initialize all shards again seems to be somehow
related to the amount of time since the last restart. The more time since
the last restart, the longer it takes to initialize all shards. What's
going on? Is there some log that's being checked? Is there any way to
make this process faster by doing something before restarting? If it makes
a difference I'm on a 15 node cluster with ~2k (3 replicas) with between 2k
and 20m docs per shard. Restarts are done with the service wrapper.
Nothing related to time between restarts, but related to preventing
ES from moving nodes around after restart. You can tell it not to do
that and avoid having nodes being shuffled around, which will speed
things up. You don't mention shard shuffling, I'm just assuming
that's what's going on.
Actually, he did mention disabling shard allocation.
Tim, yes - the restarted nodes are copying across newer segments from
the running nodes. So the longer the time between shutdown and restart,
the more segment copying has to take place.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.