Clarification about recommended memory-disk ratio of 1:30

One of the most common problems when Elasticsearch is used with time-based indices is that users end up with far too many small shards. This results in a large cluster state and generally poor performance. Searching a single small shard is faster that searching a single large one (latency tends to be proportional to size) but searching lots of small smards is not necessarily faster that seacrching a few large ones. This is where the recommendation about shards per node and shard size comes from. A 40GB-50GB shard is generally quite performant for time series data, although this can depend on document size and mappings.

If you read the blog post again I think you will notice that it recommends a maximum of 20 shards per GB heap. It does not state that a node should be able to handle 600 shards of any particular size. Staying below 20 shards per GB heap will generally avoid you suffering problems associated with too many shards, but if you are using large shards I would expect you to have far less shards than this.

When it comes to sizing different types of nodes I tend to think of it in terms of balancing resource usage, e.g. heap memory, CPU and disk I/O. Different types of nodes will have different resource profiles. Hot nodes may have the same heap size as a warm or cold node, but generally has much higher I/O capacity and more CPU cores. The mount of resources required also depends on how the node is used. This webinar discusses this and talks about how the indexed data on disk affects heap usage.

Hot nodes that serve requests will he parsing and indexing a lot of data, which at high ingestion rates require a good amount of heap and CPU. The process of writing indexed data to disk and merging segments is also quite I/O intensive. As indexing data is an expensive process you often want to generate time-based indices with the shard size you want to use in later phases, although you can use the shrink index API to adjust this. All this processing means there is less heap memory that can be used for indexed data on disk, which is why this type of node tend to store relatively little data. I have seen many cases where RAM-to-disk ratios have been less than 1:30 for an optimal cluster. A common assumption is also that the most recent data is the most heavily queried and that data gets queried less the older it gets. This means that hot nodes also serve queries for the data they hold, which further uses resources in tems of both heap, CPU and disk I/O.

If you compare this to warm nodes that do not perform any indexing and where all almost all resources can be dedicated to storing data and serving queries, these are able to allocate much more heap to holding data on disk. Even though they may have slower storage with worse I/O performance, the fact that they primarily only serve queries means this goes further.

For cold nodes that store indices that are frozen heap usage is even lower. These nodes can store data at very high RAM-to-disk ratios as indexed data in frozen state uses very little heap. A very small node can serve very large amounts of data and the shards per heap size in Gb does not apply here. The amount of data stored on this type of node is often limited by the required query latency rather than system resources.

In the end this is as David says just high level guidelines to avoid the most common mistakes. All use cases are different so there is no substitute for benchmarking and testing with realistic data and workloads if you want to be sure.

3 Likes