I'm wondering how are you guys doing with data rebalancing on large clusters ? We have 3 masters, 3 hot nodes and 6 cold. Some cold nodes have 4T disk and few 2T. ES is rebalancing shards based on number shards per node so time to time we need manually relocated heavy shards between nodes. We are not able control shards size, some are 200G some 10G. We are still on ES 6.8 and we are planning upgrade to 7.x where we can use index lifecycle management.
I read about plugin that enables rebalancing based on disk usage, another option is cron job and script to rebalance data.
What is your approach and why you choose it ?
Thanks,
We had the same problem but with smallest amount of data and from my point of view there is only one way i.e. change shard size. Maybe I'm wrong but I do not find another sollution. Why you can not change the size of shards for new indexes? Do you have implemented any deletion polices for existing indexes?
We are not using ILM so it's hard control shards size, and we have some "silver tape" solution. To speed up cluster and avoid outage we precreate indexes, we use daily index name convention.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.