Slow Merge operation blocking node recovery on startup


(Ankush Jhalani) #1

We have using v0.90.1 for a 4 node cluster - 1 data, 1 client ES running on
each node. We have an index ~80 GB in size, 5 shards each with 3 replica.
Less than a 1% data changes everyday. All the merge settings are default.

What we are noticing that when we bring down ES and start it again, it can
take upto an hour to completely start this index and go from status 'yellow'
-> 'green'. I turned on debug trace and noticed on starting the node, each
shard is merging and taking from 17-30 minutes. This seems to be happening
even when we don't have any new indexing is going on.

We were hoping that syncing up time on node startup should be very fast,
but this really slows things down and is very confusing.

  • Why is merge operation happening during a node startup? I would think
    merge should try to schedule at a low-activity period, which a node startup
    is clearly not.
  • Why the merge seems to block node recovery during startup?

Can we reschedule merge to happen after node recovery completes or to not
block node recovery. I have spent a lot of time trying to debug this issue
so any help would be much appreciated. thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

by default recovery is throttled on elasticsearch 0.90 - maybe this kicks
in in your case (if you are using SSDs this is really slow)? Do you have
some monitoring in place to find out current read speeds?

See the recovery section at
http://www.elasticsearch.org/guide/reference/modules/indices/

--Alex

On Wed, Sep 18, 2013 at 5:29 PM, Nakul ankush.jhalani@gmail.com wrote:

We have using v0.90.1 for a 4 node cluster - 1 data, 1 client ES running
on each node. We have an index ~80 GB in size, 5 shards each with 3
replica. Less than a 1% data changes everyday. All the merge settings are
default.

What we are noticing that when we bring down ES and start it again, it can
take upto an hour to completely start this index and go from status 'yellow'
-> 'green'. I turned on debug trace and noticed on starting the node,
each shard is merging and taking from 17-30 minutes. This seems to be
happening even when we don't have any new indexing is going on.

We were hoping that syncing up time on node startup should be very fast,
but this really slows things down and is very confusing.

  • Why is merge operation happening during a node startup? I would think
    merge should try to schedule at a low-activity period, which a node startup
    is clearly not.
  • Why the merge seems to block node recovery during startup?

Can we reschedule merge to happen after node recovery completes or to not
block node recovery. I have spent a lot of time trying to debug this issue
so any help would be much appreciated. thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3