This suggests keeping to less than 20 shards / GB of heap.
But this blog post about the frozen tier seems to (if I read it correctly) benchmarked a test case with 12500 shards on a node with 29GB of heap for a ratio of 431 shards / GB.
So the simple question is: do frozen shards have a smaller or nil heap requirements? Even more to the point: do you need to size your frozen tier based on the projected number of shards regardless of the data size?
This is most pertinent when trying to keep very long durations of logs and observability data in long-term (>1 year) retention.
At the moment my minimal Elastic Cloud Frozen tier has ~2400 shards (= indices since we force merge at rollover) it's managing in its 1.8GB of heap and seems to be ok (memory pressure running a bit over 50%), which seems to suggest that the heap impact per Frozen tier shard is pretty limited. But as time passes the shard count will only go up as time passes and indices continue to rollover.
To be clear, when I casually wrote "frozen shards" I meant "shards which exist on the Frozen tier". Which I think (but I'm new to this) is somehow different than indices that have been frozen by the freeze/unfreeze API.
So is there some sort of guidance for how many "shards which exist on the frozen tier" can be managed by a GB of heap on on the frozen node?
So sounds that you are speaking of the new frozen tier which is backed by searchable snapshots.
But feel free to read this just to clear up any potential confusion.
So assuming you are speaking about the new frozen tier The guidance about The ratio of 20 shards per 1GB JVM memory is not meant to be applied to the frozen tier that is backed by searchable snapshots.
So you do not size your frozen tier node based on the number of shards.
A good way to think of Frozen tier node Is to look at what we do in Elastic cloud.. HW Profiles
We use is node with 8 vCPU 60 GB RAM and 4.8TB SSD, which supports up to about 100TB of searchable snapshots...(on AWS this is an i3en)
Of course they can always be use case specific considerations, but that's what we do in Elastic Cloud... If you have normal shard sizes that say between 10 and 50 GB per shard You should be in decent starting point.
Yes, that blog showed an extreme case of a petabyte with 12,500 shards hanging off a single node... I would interpret that as an upper bound... But it's pretty cool that it works that way.
So you could create a node described above and you can start adding S3 searchable snapshots starting with 100TB or so and add to it until the performance flattens out or does not meet your requirements.
In recent versions the limiting factor in the frozen tier is likely the total number of mapped fields rather than the number of indices. The referenced blog post used shards with a pretty small field count and shows that you can achieve very high densities with such mappings. However if your mappings have hundreds (or thousands!) of fields then you'll run out of resources sooner.
I've asked whether we can add a bit more detail about the mappings used in this impressive benchmark setup.
Also note that we're working on some changes to the sizing guidance for 8.2+ in the docs in this PR. It's not committed yet but I don't expect it to change drastically from its current form.
It seems that the short answer is "it's complicated". Which makes it hard for new users like myself utilizing Elastic Cloud to configure things appropriately.
What's particularly unclear to me at the moment is how to apply the advice in the aforementioned "size-your-shards.asciidoc" to the Frozen Tier. Intuitively, seems like the searchable snapshots should represent a different overhead, but the fact that they're just another index seems to suggest that they really aren't?
Our particular situation is we have a small environment we're trying to monitor with Elastic. We have <20 Linux servers with the System integration deployed to them + Elastic Security to a few + specific integrations for specific server use cases (e.g. Apache, MySQL, AWS) + a handful of custom logs. All told there's just under 80 data streams. A fair number (around 15 it appears) are for monitoring Elastic itself. The tension that I'm trying to balance is that while we want to keep this data for an extended period (up to a year) we don't really need the data on "hot" past about 1-2 days after it was created. So we don't need (or want from a cost perspective) a huge Hot Tier as we would if we allowed 10GB per data stream to exist in Hot. Hence we rollover to Frozen fairly frequently so we can get the data to the cheaper storage.
Trying to tune the ILMs to limit the rollovers is an ongoing process. And while the shard growth has slowed, we're only a couple of months into our targeted year+ retention, so it will continue to grow, and so the question becomes when does it become a problem and how do we mitigate that problem without arbitrarily making Hot larger to hold more data before we rollover?
That's right, today they handle mappings in much the same way as any other index, with the same overheads. They're more lightweight than other indices in various other ways (e.g. disk space and CPU), but with all the other ways that Elasticsearch has become more streamlined recently the limiting factor is now usually the mappings. #86440 tracks ongoing work to trim down this area too.
I think the docs PR I linked above has everything you need to come up with a reasonable estimate to answer this. If you disagree, can you help me understand what's missing?
Either scale up your hot tier to roll over less frequently, or scale up your master nodes and frozen tier to handle the expected quantity of data. Or maybe do a bit of both, since the sweet spot will depend on your cost structure and may well be somewhere in the middle.
Remember I know almost nothing (relatively speaking) about the details of Elastic, I was just sold on the "Elastic Cloud is easy, we have one-click integrations for you!" And selecting node sizes seems to be focused on the storage size, whereas in reality the amount of data you're sending in seems somewhat less important than the number of indices, shards, and fields. Which drives the memory requirements, which seems like a possibly more important sizing metric for selecting node sizes.
The referenced document definitely not have a consolidated formula, although it does say "Each index, shard and field has overhead". Which implies that it would be possible to say something like:
indices * x + shards * y + fields * z + base overhead = heap requirement
We might infer from later statements that:
x = 1GB / 3000 = ~350 KiB
z = 1KiB
There's nothing that really talks about the per-shard overhead though and there is a statement that "Segments play a big role in a shard’s resource usage.", so it's possible that the correct metric is really segments not shards?
Also, it doesn't seem like there's an easy way to find how many fields you actually have? I see the field usage API but it looks like I would have to issue that for every single index? And then what value am I looking for? shards[n].stats.all_fields.any? shards[n].stats.all_fields.stored_fields? Is the total number of fields more easily reflected somewhere in the monitoring / management dashboards that I'm not seeing.
Another thing not clear to me is your comment about "or scale up your master nodes". My Elastic Cloud deployment did not create a dedicated master. As we're talking about this heap utilization and it being a limiting factor my assumption was that it was the heap on the Frozen node that's going to be the critical factor because that's where all the searchable snapshot indices live. But are you suggesting that maybe that impact is really on the (not dedicated at the moment) Master node? If adding a Master node would help, how does one size that?
There's also a discussion of cluster.max_shards_per_node setting a maximum limit on shards, and a recommendation to decrease your cluster's shard count or increase the value "If you’re confident your changes won’t destabilize the cluster", which seems like hard to understand if you don't know how many shards you should be able to support in your cluster. And the previous discussion on the page implies that shard count is really only one piece of the puzzle for what's reasonable to be supported so that limit seems a perhaps too crude metric? (And if it's described as "per node" does that mean per all nodes or just per hot nodes or?)
On Cloud the short answer is "enable autoscaling" because this will let the platform automatically size your cluster correctly for the data it holds.
To give a bit more historical context, shards used to be pretty heavyweight so it made sense to focus on shard count, and that's the origin of guidelines about shard counts per GB and features like cluster.max_shards_per_node. These days that's not really true: shards are not completely free but other concerns tend to dominate the cluster's resource needs. However sizing guidelines etc tend to rely on operational experience and therefore lag behind the reality of the situation a bit.
I don't think there's a very easy way to count the fields in an index today (but it's on the roadmap to add relevant metrics like this one, hopefully soon: #86639).
Cloud will add dedicated master nodes when it decides your cluster is large enough to need them.
Please note that autoscaling does not work off of CPU or memory for Elasticsearch data nodes, only disk usage. (we do autoscale on memory for machine learning nodes).
So in my case where it seems like the heap appears to be the greater limiting factor, autoscaling wouldn't help. (Even if I was comfortable with it bumping costs by ~40% because some index got busy. This is already more expensive than expected.)
It appears that dedicated master nodes are only added to the cloud configuration when you configure more nodes, not based on how busy or big the nodes might be or how many indices/shards/fields you have:
Dedicated master nodes will be automatically added once you reach 6 Elasticsearch nodes across all zones.
I agree issue #86639 is much needed for the reasons stated in that issue!
At the moment I seem stuck with "do what you can to reduce the index/shard count as much as possible while staying within your Hot tier budget and hope it doesn't become a problem for the Frozen tier in the future".
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.