As promised, these are the results of our testing for this configuration,
which showed that Elasticsearch functioned as we expected. We ran this
test today, bringing up the additional 5 nodes while under a moderate load.
Elasticsearch immediately began relocating shards to the new nodes. It
limited the transfer to only two shards at a time. While it executed, I/O
shot up an extra 35MBytes/s across our cluster. But our load tests barely
showed a decrease in performance. Our load test was running at only about
20% of our max, so we did have excess capacity.
Our total index size was about 16GB, with approximately 850K docs. This
size includes the replicas. So, reallocation had to move about 8GB of data
to achieve equal balancing across the nodes. This process took about 4
minutes.
I just want to add that this is likely not a normal situation for those
using Elasticsearch, in that we had pre-configured for 10 nodes, but only
started out with 5. This might be useful for a situation where an
environment is expanded/contracted on a regular schedule amongst a fixed
set of resources. In a truly dynamic situation where resources need to be
expanded on demand, the multi-cast model would likely be needed as opposed
to our fixed unicast configuration.
On Thursday, August 9, 2012 5:19:52 PM UTC-4, John Nader wrote:
We are getting ready to test expanding our ES environment from 5 nodes to
10 nodes. I wanted to get a sanity check on our approach.
We currently have a test environment configured with 10 hosts using
explicit unicast IPs:
gateway.expected_nodes: 10
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ IP1, IP2,.....IP10 ]
and 10 shards withe 2 replicas:
index.number_of_shards: 10
index.number_of_replicas: 2
However, only the first 5 nodes were up and running when we indexed our
content, resulting in 10 shards x 3 copies distributed evenly across 5
active nodes. We are leaving all 'tuning' parameters at default values
(e.g. cluster_concurrent_rebalance, recovery.concurrent_streams).
Our plan is to activate the additional 5 nodes, and watch Elastic Search
rebalance the shard evenly across all 10 nodes. Is this a sound approach?
Should we do anything more than simply start the additional nodes? Maybe
disable allocation, while they are coming up, and then re-enable
allocation, for a smooth uninterrupted rebalancing?
Feedback is appreciated. Also, I will be happy to share the results of
our experiment.
--