We have created around 20-25 data streams (each set to a single shard allocation, no replicas) so I would expect our heap requirement to be around 2GB. I have verified only 1 index exists per data stream.
However if I look at the state of the deployment, it shows 165 primary shards.
As we have only created a max 25 shards. most of these are (as far as I know) system shards which we have no control over ?
For example there are .watcher-, apm-, elastic-cloud-,
.ent_search-, .kibana- etc shards
Do these shards need to be taken into account when sizing our deployment? With 165 shards we would need a heap size of over 8GB!
A data stream is backed by a number of time based indices, not a single index. The number of indices will depend on how often it is configured to roll over based on time. If it rolled over every day each data stream would consist if 60 primary shards plus replicas given your retention period. In that case 20 data streams would result in 1200 primary shards plus replicas. You therefore need to check your ILM settings to see how many shards will be generated.
Also note that the 20 shards per GB heap is a maximum, not a recommended level. There have been improvements since that old blog post was written but as far as I know it is still valid.
I understand that a data stream has time bases indices and these should be taken into account when sizing the deployment. I've a ILM policy to rollover at 30 days/30GB and then the index is deleted 1 day after. As this is a test setup the likelihood is the rollover will occur on 30 days as there is very little data.
It was more around the other indexes (be it data stream based or a normal index). e.g. the ones I noted in my original post.
As an example, I configured my deployment to ship it's logs and metrics to a deployment called 'Monitor'. The 'Monitor' deployment has had no manual changes yet if I do a GET _cat/shards
there are 113 shards listed. So roughly speaking this deployment would require a heap size of 5-6GB if all 113 shards are included in the deployment sizing recommendations?
It is a general guideline, not an absolute limit. Your cluster may be fine as a number of improvements have been made with respect to heap usage since that blog post was published, especially as you are not creating lots of small shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.