I have a cluster with 2 dedicated data nodes, each with 16TB of storage. After some time, I added 4 more dedicated data nodes, each of those with 16TB of storage as well. I keep 30 days of indexes open, and the rest of the indexes (about 5 months worth) are closed.
What I see is that the original 2 nodes only have about 2TB free now, whereas the new nodes have about 8TB free. It appears that all data that existed prior to the 4 new nodes remained on the 2 old nodes, and all new data has been spread evenly across all 6.
What I would like to do is rebalance all data nodes so as to have them contain roughly the same amount of stored data, and thus have roughly the same amount of disk free space.
What would be the best way to accomplish this task?
I did a little poking around on the file system, and found this structure under my data directory:
root@bdprodes05:[631]:/derby/data/disk1/elasticsearch/elasticsearch-prod/nodes/0> ls -l
total 132
drwxr-xr-x 2299 elasticsearch elasticsearch 126976 Sep 26 03:12 indices
-rw-r--r-- 1 elasticsearch elasticsearch 0 Oct 3 2014 node.lock
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Jun 24 17:24 _state
Then within "indices", I can sort by age of indexes with a simple "ls -lrt". Then within the directory of the various indexes, I can see what appears to be the shard numbers that reside on this server:
root@bdprodes05:[645]:/derby/data/disk1/elasticsearch/elasticsearch-prod/nodes/0/indices/myindex-20150330> ls -l
total 12
drwxr-xr-x 5 elasticsearch elasticsearch 4096 Mar 29 23:41 4
drwxr-xr-x 5 elasticsearch elasticsearch 4096 Mar 29 23:41 5
drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 30 06:10 _state
I wonder if stopping the source node and destination node, then simply scp'ing the index from one place to another is enough? Very manual, and I would prefer ES to do this work for me somehow, but I'm not sure that is possible.
Also wondering about the various _state directories and what will happen to them if I move indexes out from under them.
All of these moved indexes are CLOSED as well, not open.
Am I on the right track? Can anyone offer some help?
Wow. Excellent suggestion. I was able to find the indexes which had all shards on those servers, and opened one. It opened, and rebalanced across all 6. Sweet. So, it's just a matter of going through them and doing that process.
What would be even cooler is if there was a way to ask ES to rebalance an index automatically, even it if was closed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.