Unexpected rebalancing behavior

Thanks for ElasticSearch, it's grand.

I'm running ES 0.18.6, under Sun's JDK in Ubuntu 11.04 on EC2, in a 6-
node cluster, near-default config: only setting cluster.name,
network.host, discovery.zen.ping.unicast.hosts, and path.data. None of
the shard allocation or rebalancing settings have been touched. The
cluster is set up to use a single index, so 10 shards are in play.

Whenever this index is (re)created, it appears as evenly allocated as
it can be for that configuration, with 1 or 2 shards per node. That's
what I'd expect to always be the case when rebalancing is complete.

However, when rebalancing does occur (nodes entering/leaving the
cluster) I frequently end up unbalanced, with most nodes only holding
1 shard and one or two of them holding 3 or 4 shards.

Is this a known behavior of the current algorithm? Everything I've
read in the docs + mailing list says that balancing is supposed to go
by number of shards per node (vs e.g the oft-discussed balancing by
data weight), but I wonder if I am misunderstanding something.

Many thanks,
Jeff F.

The cluster should eventually balance itself on the cluster, are you sure there are no relocating shards when you see it?

On Friday, February 10, 2012 at 5:06 AM, Jeff Forcier wrote:

Thanks for Elasticsearch, it's grand.

I'm running ES 0.18.6, under Sun's JDK in Ubuntu 11.04 on EC2, in a 6-
node cluster, near-default config: only setting cluster.name (http://cluster.name),
network.host, discovery.zen.ping.unicast.hosts, and path.data. None of
the shard allocation or rebalancing settings have been touched. The
cluster is set up to use a single index, so 10 shards are in play.

Whenever this index is (re)created, it appears as evenly allocated as
it can be for that configuration, with 1 or 2 shards per node. That's
what I'd expect to always be the case when rebalancing is complete.

However, when rebalancing does occur (nodes entering/leaving the
cluster) I frequently end up unbalanced, with most nodes only holding
1 shard and one or two of them holding 3 or 4 shards.

Is this a known behavior of the current algorithm? Everything I've
read in the docs + mailing list says that balancing is supposed to go
by number of shards per node (vs e.g the oft-discussed balancing by
data weight), but I wonder if I am misunderstanding something.

Many thanks,
Jeff F.

Hi Shay, thanks for the reply.

On Feb 12, 3:44 am, Shay Banon kim...@gmail.com wrote:

The cluster should eventually balance itself on the cluster, are you sure there are no relocating shards when you see it?

Insofar as elasticsearch-head (and the raw JSON status output, when I
look at it) reports all shards as green / STARTED, yes.

In other words, what happens is:

  • Cluster is stable/balanced
  • Node drops out
  • Cluster redistributes that node's shards to existing nodes (so,
    these shards show up as e.g. yellow / INITIALIZING on their new hosts)
  • Rebalancing appears to finish after some time, with everything back
    to green / STARTED state.
  • I watch the cluster for another ~5, 15, even 30 minutes, and no
    further rebalancing seems to occur.

My assumption is that once all shards return to STARTED, rebalancing
is "done" and no more rebalancing will occur until more nodes enter/
leave. The docs appear to imply this, re: the settings one can tweak
about when rebalancing is allowed.

Is that correct, or am I just not waiting long enough? E.g. will
rebalancing run periodically in addition to being triggered by cluster
state changes?

Thanks,
Jeff

On Friday, February 10, 2012 at 5:06 AM, Jeff Forcier wrote:

Thanks for Elasticsearch, it's grand.

I'm running ES 0.18.6, under Sun's JDK in Ubuntu 11.04 on EC2, in a 6-
node cluster, near-default config: only setting cluster.name (http://cluster.name),
network.host, discovery.zen.ping.unicast.hosts, and path.data. None of
the shard allocation or rebalancing settings have been touched. The
cluster is set up to use a single index, so 10 shards are in play.

Whenever this index is (re)created, it appears as evenly allocated as
it can be for that configuration, with 1 or 2 shards per node. That's
what I'd expect to always be the case when rebalancing is complete.

However, when rebalancing does occur (nodes entering/leaving the
cluster) I frequently end up unbalanced, with most nodes only holding
1 shard and one or two of them holding 3 or 4 shards.

Is this a known behavior of the current algorithm? Everything I've
read in the docs + mailing list says that balancing is supposed to go
by number of shards per node (vs e.g the oft-discussed balancing by
data weight), but I wonder if I am misunderstanding something.

Many thanks,
Jeff F.

On Feb 13, 10:41 am, Jeff Forcier bitprop...@gmail.com wrote:

Is that correct, or am I just not waiting long enough? E.g. will
rebalancing run periodically in addition to being triggered by cluster
state changes?

Replying to add: this morning I did additional tests, the primary
difference being that dropped nodes stayed out of the cluster instead
of being re-added quickly (i.e. halting the ES process, vs restarting
it as I was doing last week.) The rebalancing appears to be working
better now, though it's unclear whether it was the above difference,
simply giving it more time to balance out, or luck.

Thanks again for the help -- I'll follow up if I am able to reproduce
last week's imbalance issues later on.

-Jeff

On Friday, February 10, 2012 at 5:06 AM, Jeff Forcier wrote:

Thanks for Elasticsearch, it's grand.

I'm running ES 0.18.6, under Sun's JDK in Ubuntu 11.04 on EC2, in a 6-
node cluster, near-default config: only setting cluster.name (http://cluster.name),
network.host, discovery.zen.ping.unicast.hosts, and path.data. None of
the shard allocation or rebalancing settings have been touched. The
cluster is set up to use a single index, so 10 shards are in play.

Whenever this index is (re)created, it appears as evenly allocated as
it can be for that configuration, with 1 or 2 shards per node. That's
what I'd expect to always be the case when rebalancing is complete.

However, when rebalancing does occur (nodes entering/leaving the
cluster) I frequently end up unbalanced, with most nodes only holding
1 shard and one or two of them holding 3 or 4 shards.

Is this a known behavior of the current algorithm? Everything I've
read in the docs + mailing list says that balancing is supposed to go
by number of shards per node (vs e.g the oft-discussed balancing by
data weight), but I wonder if I am misunderstanding something.

Many thanks,
Jeff F.