How is data within an index distributed into shards?

Jan_Pisl · February 27, 2018, 2:02pm

I am new to ES and although I have read several blog posts and articles, I still cannot find the answers I am looking for. I need to ensure that shards in my ES database will have the optimal size (performance-wise).

I understand that the optimal size of a shard depends on multiple factors and that it (I suppose) can only be determined by tests with using real data.

What I don't understand is how data is distributed into shards within an index. For example, if there are five (primary) shards in an index and I set the maximal index size to 100 GB, will each shard be 20 GB when the index reaches its limit? So, if I find out the optimal shard size I can just multiply it by the number of shards in an index and use this value as the maximal index size? Or is it not this simple?

And more generally, why is it that there is no tool for simply specifying the maximal shard size instead of the maximal index size (such se the Rollover API) ?

dadoonet · February 27, 2018, 2:16pm

and I set the maximal index size to 100 GB

No you never do that with elasticsearch. Sky is the limit

May I suggest you look at the following resources about sizing:

Jan_Pisl · February 27, 2018, 2:29pm

Thank you for you response!

I will most certainly go through the recourses you suggested. However, I am a bit perplexed by you saying

because in the very article you recommend me (and that I have already read) "How many shards should I have in my Elasticsearch cluster?", they say that "a shard size of 50GB is often quoted as a limit that has been seen to work". So doesn't it make sense to limit the size of indices then (because they consist of shards)? Or how else do I control the size of shards? I feel like I am missing something here...

dadoonet · February 27, 2018, 2:46pm

Yes. That's a theorical limit. If you send more data, elasticsearch will still accept it and index it.
So there is no "limit" as an index settings or something like this.
I'd say that the only limit I know for now is the available disk space.

So doesn't it make sense to limit the size of indices

yes. But elasticsearch does not have that built-in setting as I said.

Or how else do I control the size of shards?

You can use the rollover API to achieve what you are looking for: keeping data under a given limit per shard.
See Rollover Index | Elasticsearch Reference [6.2] | Elastic

Jan_Pisl · February 27, 2018, 3:00pm

Ok so just to clarify I am on the same page: the way to control shard size (keep data under a given limit per shard) is to control index size & number of shards per index. do you agree on that?

thank you very much your answers are very helpful to me!

dadoonet · February 27, 2018, 6:35pm

I agree on that.

system · March 27, 2018, 6:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sharding in ES Elasticsearch	5	355	June 8, 2018
Maximum Shard Size in ElasticSearch Elasticsearch	2	19961	July 5, 2017
Specify index shard size when creating index? Elasticsearch	4	498	April 25, 2017
Best Source for In-Depth Understanding of Indices and Shards Elasticsearch	7	367	July 19, 2018
Limit for shard size? Elasticsearch	2	3691	July 5, 2017

How is data within an index distributed into shards?

Related topics