I am new to ES and although I have read several blog posts and articles, I still cannot find the answers I am looking for. I need to ensure that shards in my ES database will have the optimal size (performance-wise).
I understand that the optimal size of a shard depends on multiple factors and that it (I suppose) can only be determined by tests with using real data.
What I don't understand is how data is distributed into shards within an index. For example, if there are five (primary) shards in an index and I set the maximal index size to 100 GB, will each shard be 20 GB when the index reaches its limit? So, if I find out the optimal shard size I can just multiply it by the number of shards in an index and use this value as the maximal index size? Or is it not this simple?
And more generally, why is it that there is no tool for simply specifying the maximal shard size instead of the maximal index size (such se the Rollover API) ?