How is data within an index distributed into shards?

I am new to ES and although I have read several blog posts and articles, I still cannot find the answers I am looking for. I need to ensure that shards in my ES database will have the optimal size (performance-wise).

I understand that the optimal size of a shard depends on multiple factors and that it (I suppose) can only be determined by tests with using real data.

What I don't understand is how data is distributed into shards within an index. For example, if there are five (primary) shards in an index and I set the maximal index size to 100 GB, will each shard be 20 GB when the index reaches its limit? So, if I find out the optimal shard size I can just multiply it by the number of shards in an index and use this value as the maximal index size? Or is it not this simple?

And more generally, why is it that there is no tool for simply specifying the maximal shard size instead of the maximal index size (such se the Rollover API) ?

and I set the maximal index size to 100 GB

No you never do that with elasticsearch. Sky is the limit :slight_smile:

May I suggest you look at the following resources about sizing:

Thank you for you response!

I will most certainly go through the recourses you suggested. However, I am a bit perplexed by you saying

because in the very article you recommend me (and that I have already read) "How many shards should I have in my Elasticsearch cluster?", they say that "a shard size of 50GB is often quoted as a limit that has been seen to work". So doesn't it make sense to limit the size of indices then (because they consist of shards)? Or how else do I control the size of shards? I feel like I am missing something here...

Yes. That's a theorical limit. If you send more data, elasticsearch will still accept it and index it.
So there is no "limit" as an index settings or something like this.
I'd say that the only limit I know for now is the available disk space.

So doesn't it make sense to limit the size of indices

yes. But elasticsearch does not have that built-in setting as I said.

Or how else do I control the size of shards?

You can use the rollover API to achieve what you are looking for: keeping data under a given limit per shard.
See Rollover Index | Elasticsearch Reference [6.2] | Elastic

Ok so just to clarify I am on the same page: the way to control shard size (keep data under a given limit per shard) is to control index size & number of shards per index. do you agree on that?

thank you very much your answers are very helpful to me! :sunny:

I agree on that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.