Number of shards and deployment sizing

We have created a small deployment on Elastic Cloud which we are testing with non-live data.

Reading this blog post How many shards should I have in my Elasticsearch cluster? | Elastic Blog it suggests approximately 20 shards per GB of heap space.

We have created around 20-25 data streams (each set to a single shard allocation, no replicas) so I would expect our heap requirement to be around 2GB. I have verified only 1 index exists per data stream.

However if I look at the state of the deployment, it shows 165 primary shards.
As we have only created a max 25 shards. most of these are (as far as I know) system shards which we have no control over ?

For example there are .watcher-, apm-, elastic-cloud-,
.ent_search-
, .kibana- etc shards

Do these shards need to be taken into account when sizing our deployment? With 165 shards we would need a heap size of over 8GB!

Any advice much appreciated,

Jon

2 Likes

A data stream is backed by a number of time based indices, not a single index. The number of indices will depend on how often it is configured to roll over based on time. If it rolled over every day each data stream would consist if 60 primary shards plus replicas given your retention period. In that case 20 data streams would result in 1200 primary shards plus replicas. You therefore need to check your ILM settings to see how many shards will be generated.

Also note that the 20 shards per GB heap is a maximum, not a recommended level. There have been improvements since that old blog post was written but as far as I know it is still valid.

HI Christian, thanks for responding.

I understand that a data stream has time bases indices and these should be taken into account when sizing the deployment. I've a ILM policy to rollover at 30 days/30GB and then the index is deleted 1 day after. As this is a test setup the likelihood is the rollover will occur on 30 days as there is very little data.

It was more around the other indexes (be it data stream based or a normal index). e.g. the ones I noted in my original post.

As an example, I configured my deployment to ship it's logs and metrics to a deployment called 'Monitor'. The 'Monitor' deployment has had no manual changes yet if I do a
GET _cat/shards

there are 113 shards listed. So roughly speaking this deployment would require a heap size of 5-6GB if all 113 shards are included in the deployment sizing recommendations?

I hope that makes sense.

Thanks
Jon

It is a general guideline, not an absolute limit. Your cluster may be fine as a number of improvements have been made with respect to heap usage since that blog post was published, especially as you are not creating lots of small shards.

1 Like

Thanks for the info Christian

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.