Large/slow merge blocking node recovery during startup

We are using v0.90.1 for a 4 node cluster - 1 data, 1 client ES running on
each node. We have an index ~80 GB in size, with 5 shards and 3 replicas.
Less than a 1% data changes everyday. All the merge settings are default.

What we are noticing that when we bring down ES and start it again, it can
take upto an hour to completely start the node and go from health status yellow' ->
'green'. I turned on debug trace and noticed on starting the node, index
shards are merging and taking from 17-30 minutes each. The node recovery is
blocked during this period. This seems to be happening even when we don't
have any new indexing is going on.

We were hoping that syncing up time on node startup should be very fast,
but this really slows things down and is very confusing.

  • Why is merge operation happening during a node startup? I would think
    merge should try to schedule at a low-activity period, which a node startup
    is clearly not.
  • Why the merge seems to block node recovery during startup?

Can we reschedule merge to happen after node recovery completes or to not
block node recovery. I am looking into modifying the
indices.store.throttle.max_bytes_per_sec and other merge settings to speed
up, but I feel that's not the root issue here.

I have spent a lot of time trying to debug this issue so any help would be
much appreciated. thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.