Index relocation during initialization


(Itamar Syn-Hershko) #1

Hi there,

Whenever we restart a node (or a few) in a cluster, and also when we do a
full cluster restart, the cluster takes a lot of time to recover from
YELLOW state to GREEN.

What we observed is the cluster is relocating indexes when it goes up, or
when a few nodes restarted. Since our indexes are not small it takes them
time to relocate, and also one relocation triggers another (for balancing
reasons) this becomes a never ending pursuit.

As long as the cluster is up - no relocations happen. This only happens
when a node or few restart, or when the cluster restarts as a whole. But
still - a balance that is preserved while the cluster is up should be
preserved when there's a small disturbance. I will only expect a rebalance
when a game-changer event happens - large index is added or removed, or a
node permanently joins / removed.

I'm aware of the various settings in place that should prevent that
(expected number of nodes, time to wait etc) but obviously they don't play
well.

Our hope is the new allocation decider would help with that, but that issue
seems to be originating from some sort of a bug in the decision of WHEN to
try rebalancing.

I'll be happy to provide anything that could help pinpoint the issue

Itamar.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Clinton Gormley) #2

Hi Itamar

I'm aware of the various settings in place that should prevent that
(expected number of nodes, time to wait etc) but obviously they don't play
well.

Why do you say the don't play well? You don't mention what those settings
are in your cluster, and they're not settings that we can determine
automatically because we don't know how many nodes you expect to have in
your cluster. Those settings are designed specifically to prevent the
situation you describe and in my experience, they work very well. So I'm
guessing that you don't have them configured correctly

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Itamar Syn-Hershko) #3

Because we've set them correctly, and still seeing this behavior. It might
be something unique to our scenario (rolling indexes, each is quite big). I
will triple-triple-check and get back to you. I wish there was an easy way
to reproduce this in a test.

On Thu, Sep 12, 2013 at 2:32 PM, Clinton Gormley clint@traveljury.comwrote:

Hi Itamar

I'm aware of the various settings in place that should prevent that
(expected number of nodes, time to wait etc) but obviously they don't play
well.

Why do you say the don't play well? You don't mention what those settings
are in your cluster, and they're not settings that we can determine
automatically because we don't know how many nodes you expect to have in
your cluster. Those settings are designed specifically to prevent the
situation you describe and in my experience, they work very well. So I'm
guessing that you don't have them configured correctly

clint

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Anantha Govindarajan) #4

Hi Clinton,

We are also facing the issue , I verified recovery starts only after
expected nodes arrived. In case of full cluster restart , initially all the
shards become unavailable and master starts allocate unassigned nodes.
While allocation BalancedShardAllocator comes to play which change the
previously balanced allocation(before full restart).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd4a5c5e-660d-4c67-b01d-0064ee6ea1c5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #5