We have using v0.90.1 for a 4 node cluster - 1 data, 1 client ES running on
each node. We have an index ~80 GB in size, 5 shards each with 3 replica.
Less than a 1% data changes everyday. All the merge settings are default.
What we are noticing that when we bring down ES and start it again, it can
take upto an hour to completely start this index and go from status 'yellow'
-> 'green'. I turned on debug trace and noticed on starting the node, each
shard is merging and taking from 17-30 minutes. This seems to be happening
even when we don't have any new indexing is going on.
We were hoping that syncing up time on node startup should be very fast,
but this really slows things down and is very confusing.
Why is merge operation happening during a node startup? I would think
merge should try to schedule at a low-activity period, which a node startup
is clearly not.
Why the merge seems to block node recovery during startup?
Can we reschedule merge to happen after node recovery completes or to not
block node recovery. I have spent a lot of time trying to debug this issue
so any help would be much appreciated. thanks!
by default recovery is throttled on elasticsearch 0.90 - maybe this kicks
in in your case (if you are using SSDs this is really slow)? Do you have
some monitoring in place to find out current read speeds?
We have using v0.90.1 for a 4 node cluster - 1 data, 1 client ES running
on each node. We have an index ~80 GB in size, 5 shards each with 3
replica. Less than a 1% data changes everyday. All the merge settings are
default.
What we are noticing that when we bring down ES and start it again, it can
take upto an hour to completely start this index and go from status 'yellow'
-> 'green'. I turned on debug trace and noticed on starting the node,
each shard is merging and taking from 17-30 minutes. This seems to be
happening even when we don't have any new indexing is going on.
We were hoping that syncing up time on node startup should be very fast,
but this really slows things down and is very confusing.
Why is merge operation happening during a node startup? I would think
merge should try to schedule at a low-activity period, which a node startup
is clearly not.
Why the merge seems to block node recovery during startup?
Can we reschedule merge to happen after node recovery completes or to not
block node recovery. I have spent a lot of time trying to debug this issue
so any help would be much appreciated. thanks!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.