Shard rebalancing on single-node cluster scaling

I started out with a single Elasticsearch node (Node1).

Version:2.1.0
5 primary shards, 1 replica
I have 30+ days of logs in here, the total size of my data on node1 is 340+G.

I now intend to add two more nodes to this cluster as I need to quickly start consuming a whole lot more of data into Elasticsearch.

I added node2, with the same configuration as above. I changed the following in the yml file:

cluster.name - same on Node 1, Node 2
discovery.zen.ping.unicast.hosts: Node 1 refers to Node 2 and vice versa
discovery.zen.minimum_master_nodes: 2

I start up Node 1 first, followed by Node 2.
I also set the following cluster properties:
cluster.routing.rebalance.enable:all
cluster.routing.allocation.allow_rebalance:always

Node 1 and Node 2 discovered each other and joined the same cluster. However, no shards were rebalanced and moved to Node 2. All the data is still in Node 1.

I then tweaked the disk high watermark and low watermark to help trigger the relocation. The master node reports the following in the logs:

[2016-04-17 13:44:32,499][WARN ][cluster.routing.allocation.decider] [production-debugging-node-1] high disk watermark [35%] exceeded on [M8cyrMtDRUm00syJbBhB8A][production-debugging-node-1][/opt/elasticsearch/data/production-debugging/nodes/0] free: 871.7gb[49.2%], shards will be relocated away from this node

However, no relocation begins, all the shards are still in Node 1. Am I missing something here?

Where did you set that?

@warkolm - Via the cluster settings API.

PUT http://:9200/_cluster/settings

You shouldn't need to do that, as soon as you bring the other node(s) in it'll start to rebalance.

What does _cat/shards look like?

All replicas are unassigned and all primary shards remain on node1. Here's a sample for a single index. This is true for all of them.

logstash-kafka-aivs-2016.03.18 1 p STARTED 40672 92.5mb production-debugging-node-1
logstash-kafka-aivs-2016.03.18 1 r UNASSIGNED
logstash-kafka-aivs-2016.03.18 2 p STARTED 40319 124.4mb production-debugging-node-1
logstash-kafka-aivs-2016.03.18 2 r UNASSIGNED
logstash-kafka-aivs-2016.03.18 3 p STARTED 40790 120.9mb production-debugging-node-1
logstash-kafka-aivs-2016.03.18 3 r UNASSIGNED
logstash-kafka-aivs-2016.03.18 4 p STARTED 40430 131.7mb production-debugging-node-1
logstash-kafka-aivs-2016.03.18 4 r UNASSIGNED
logstash-kafka-aivs-2016.03.18 0 p STARTED 40555 118.1mb production-debugging-node-1
logstash-kafka-aivs-2016.03.18 0 r UNASSIGNED

And does _cat/nodes show all 3 in the cluster?

I haven't added the third yet. I understood that its better to add one at a time. Is that not the case? Should I add two new nodes at once?

What does _cat/nodes show then?

host ip heap.percent ram.percent load node.role master name
node-2-hostname node-2-IP 1 4 0.06 d m production-debugging-node-2
node-1-hostname node-1-IP 71 96 1.17 d * production-debugging-node-1

Oh boy, does this mean newly added node-2 is my master at present...?

Can you verify that both nodes are running exactly the same version of Elasticsearch by running _cat/nodes?h=h,n,v,r,m? I saw you stated that both nodes are running version 2.1.0, but as different Elasticsearch versions in a cluster can prevent shards being relocated between nodes, I would like to verify this.

Node-2 production-debugging-node-2 2.0.0 d m
Node-2 production-debugging-node-1 2.1.0 d *

Darn it, you are right. Not sure how I got here. Thanks a ton! I'll fix the versions.

Once I do so, do you recommend I add one node at a time or two nodes at once?