Number of shards and deployment sizing

yelloyonder · January 17, 2022, 6:27pm

We have created a small deployment on Elastic Cloud which we are testing with non-live data.

Reading this blog post How many shards should I have in my Elasticsearch cluster? | Elastic Blog it suggests approximately 20 shards per GB of heap space.

We have created around 20-25 data streams (each set to a single shard allocation, no replicas) so I would expect our heap requirement to be around 2GB. I have verified only 1 index exists per data stream.

However if I look at the state of the deployment, it shows 165 primary shards.
As we have only created a max 25 shards. most of these are (as far as I know) system shards which we have no control over ?

For example there are .watcher-, apm-, elastic-cloud-,
.ent_search-, .kibana- etc shards

Do these shards need to be taken into account when sizing our deployment? With 165 shards we would need a heap size of over 8GB!

Any advice much appreciated,

Jon

Christian_Dahlqvist · January 18, 2022, 6:37am

A data stream is backed by a number of time based indices, not a single index. The number of indices will depend on how often it is configured to roll over based on time. If it rolled over every day each data stream would consist if 60 primary shards plus replicas given your retention period. In that case 20 data streams would result in 1200 primary shards plus replicas. You therefore need to check your ILM settings to see how many shards will be generated.

Also note that the 20 shards per GB heap is a maximum, not a recommended level. There have been improvements since that old blog post was written but as far as I know it is still valid.

yelloyonder · January 18, 2022, 8:42am

HI Christian, thanks for responding.

I understand that a data stream has time bases indices and these should be taken into account when sizing the deployment. I've a ILM policy to rollover at 30 days/30GB and then the index is deleted 1 day after. As this is a test setup the likelihood is the rollover will occur on 30 days as there is very little data.

It was more around the other indexes (be it data stream based or a normal index). e.g. the ones I noted in my original post.

As an example, I configured my deployment to ship it's logs and metrics to a deployment called 'Monitor'. The 'Monitor' deployment has had no manual changes yet if I do a
GET _cat/shards

there are 113 shards listed. So roughly speaking this deployment would require a heap size of 5-6GB if all 113 shards are included in the deployment sizing recommendations?

I hope that makes sense.

Thanks
Jon

Christian_Dahlqvist · January 18, 2022, 9:30am

It is a general guideline, not an absolute limit. Your cluster may be fine as a number of improvements have been made with respect to heap usage since that blog post was published, especially as you are not creating lots of small shards.

yelloyonder · January 24, 2022, 8:57am

Thanks for the info Christian

system · February 21, 2022, 8:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Estimating RAM requirements for new deployment Elasticsearch	5	886	March 7, 2019
Suggest for Heap size chnage and number of shards require Elasticsearch	6	3746	August 30, 2017
How do I calculate the number of shards I should have? Elasticsearch	6	339	November 23, 2021
Sizing question Elasticsearch	2	210	June 2, 2022
Shard sizing as per the RAM available for an instance Elasticsearch	2	1653	December 5, 2020

Number of shards and deployment sizing

Related topics