Fyi : In above default values am seeing huge difference in size and shard distribution.
My ask is : Need to keep size same across cluster irrespective of shard or index numbers, how can I achieve it? And what will be the changes required in the above variables.
I'd suggest starting with our cluster documentation to explain these values and what will happen if you change them.
I think my bigger question is, though, to understand more about the requirements? What has to be equal and why does it have to be equal, rather than using Elasticsearch defaults?
- What has to be equal -> I want disk utilisation to be equal across.
- why does it have to be equal -> So i have no disk issues having on some
nodes where others are free.
- Elasticsearch defaults -> right, unable to understand what the default means, "disk_usage": "2.0E-11".
If I expand this -> I get 2.0e-11 = 2.0 x 10-11 = 0.000000000020 . So what does this mean?
If you're using default sharding strategies, shards should be roughly equal in most cases if your only concern is balancing disk usage. I'd suggest only applying significant tuning if you're seeing problems.
I think that the main issue here is that the documentation is not helpful in this case, it just show the defaults but does not explain how they are used, so it is indeed cryptic for end users.
Thanks @leandrojmp for clearing this out, by maintaining equal size of shards across, should help us solve disk issues as number of shards across cluster is also a criteria as per the formulae and several more. And that could be an interim solution for us.
Also a suggestion or a feature to think upon I would say, there are use cases where we store data across different apps and ES will be a warm storage after Mysql, so at situations several mircroservices might not be able to keep same size per shard, since ES is distributed and such kind of usecases will be required just keep adding nodes to cluster and ensure size is maintained same. If such a flexibility is given for end user to alter. Its a great plus. It might be there even now, its just some more documentation on how to treat my use case keeping disk same across nodes irrespective of load/shard etc would be helpful.
Again thanks for your quick responses @Kathleen_DeRusso and @leandrojmp.
I do have registered to Bangalore Meet with ES team 25sept-24, will meet your team there.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.