Strange rebalancing of shards

Steff · October 18, 2011, 12:46pm

Hi

We have made a simple test on rebalacing of shards.

Start-state:
One index with 3 shards (1 replica)
Two nodes:
Node1 running primary of shard1, primary of shard2 and replica of
shard3
Node2 running primary of shard3, replica of shard1 and replica of
shard2

Action:
We start a new node (Node3) that joins the cluster (multicast)

End-state (after rebalancing has finished)
Three nodes:
Node1 running replica of shard3
Node2 running replica of shard1 and replica of shard2
Node3 running primary of shard1, primary of shard2 and primary of
shard3
Basically ALL primarys have been moved to the new node.

We think that this is a very strange rebalancing of shards that ES
decided to do. We would probably have excpeted something like this as
end-state (a more "even" end-balance):
Three nodes:
Node1 running primary of shard1 and replica of shard3
Node2 running primary of shard3 and replica of shard2
Node3 running primary of shard2 and replica of shard1
Basically moving one primary and one replica over to Node3

Any comments on this "strange" behaviour? Is it intended, or do you
have any explanation why it behaves like that?

We repeated the test twice with the same result both times.

Regards, Per Steffensen

kimchy · October 18, 2011, 6:16pm

How do you check where shards are allocated? Here is a simple gist that I
used and I can see that with 3 nodes, and an index with 3 shards and 1
replica, shards are evenly distributed across the nodes (2 shards per
node): gist:1296201 · GitHub.

On Tue, Oct 18, 2011 at 2:46 PM, Steff steff@designware.dk wrote:

Hi

We have made a simple test on rebalacing of shards.

Start-state:
One index with 3 shards (1 replica)
Two nodes:
Node1 running primary of shard1, primary of shard2 and replica of
shard3
Node2 running primary of shard3, replica of shard1 and replica of
shard2

Action:
We start a new node (Node3) that joins the cluster (multicast)

End-state (after rebalancing has finished)
Three nodes:
Node1 running replica of shard3
Node2 running replica of shard1 and replica of shard2
Node3 running primary of shard1, primary of shard2 and primary of
shard3
Basically ALL primarys have been moved to the new node.

We think that this is a very strange rebalancing of shards that ES
decided to do. We would probably have excpeted something like this as
end-state (a more "even" end-balance):
Three nodes:
Node1 running primary of shard1 and replica of shard3
Node2 running primary of shard3 and replica of shard2
Node3 running primary of shard2 and replica of shard1
Basically moving one primary and one replica over to Node3

Any comments on this "strange" behaviour? Is it intended, or do you
have any explanation why it behaves like that?

We repeated the test twice with the same result both times.

Regards, Per Steffensen

Steff · October 24, 2011, 5:36pm

On Oct 18, 8:16 pm, Shay Banon kim...@gmail.com wrote:

How do you check where shards are allocated?

I use elasticsearch-head (http://mobz.github.com/elasticsearch-head/)

Here is a simple gist that I
used and I can see that with 3 nodes, and an index with 3 shards and 1
replica, shards are evenly distributed across the nodes (2 shards per
node):gist:1296201 · GitHub.

Hmm, it was not what we saw. Just leave this issue for now, until I
get to repeat the test to see if it is a consistent problem and maybe
find out what causes it.

Steff · October 26, 2011, 7:15am

gateway.expected_nodes might have been set to 2 all the way through
the test. Could that be the cause of the "problem"?

Regards, Per Steffensen

kimchy · October 26, 2011, 5:01pm

No, it shouldn't, it just control when the balancing / recovery will start,
not how it will happen... .

On Wed, Oct 26, 2011 at 9:15 AM, Steff steff@designware.dk wrote:

gateway.expected_nodes might have been set to 2 all the way through
the test. Could that be the cause of the "problem"?

Regards, Per Steffensen