Hot/Warm Architecture with uniform hardware?

algo · December 17, 2025, 8:26pm

Is there any benefit to (or any problems with) data tiering if all nodes use the same type of storage media? In the past, I’ve seen suggestions to transition to a hot/warm/cold architecture for different storage media - SSDs, local HDDs, network-attached HDDs, for example.

Because we only have one type of storage available, our cluster has historically only used hot tier nodes. We're exploring tiering as a way to adjust the RAM:Disk ratios for our nodes - this would allow us to increase cluster storage without increasing the total number of data nodes to manage.

For background, our cluster looks like the following:

VMs with network-attached storage (enterprise-grade HDDs)
3 master nodes (16GB RAM each), 2 ingest nodes, and 15 data nodes (2TB storage; 32GB RAM each; 8GB JVM Heap).
Cluster Disk Available: 5.5TB / 28.2TB
Typical monthly ingest is approx. 2-4TB, including shard replicas

Our current thoughts with tiering would adjust the data node specs as follows:

Hot Tiers:

2TB storage
32GB RAM
16GB JVM heap

Warm Tier:

5TB storage
32GB RAM
8GB JVM

Does anyone see any issues with this? Is tiering only recommended if you have different storage media available?

Christian_Dahlqvist · December 17, 2025, 8:52pm

What is driving this suggested change in cluster topology? What is the problem you are looking to solve?

Which version of Elasticsearch are you using?

What is your retention period(s) for the data?

How is the performance of the cluster as it is currently configured?

How many nodes of the different types would the proposed hot-warm topology have?

Would this change coincide with any change of ingest volume or retention period?

algo · December 17, 2025, 9:23pm

6 of the 15 nodes are above the low watermark. We've decided to increase the overall cluster storage to better support occasional logging spikes, improve time spent rebalancing, and reduce search latency associated with high disk util.

8.17.3 with plans to upgrade to 9.x as part of our cluster redesign

Everyday performance is generally fine. Search latency becomes noticeable if 7 or more nodes are above the low disk watermark threshold, sometimes causing nodes to leave cluster and triggering rebalancing.

Any maintenance or rebalancing takes significant amounts of time (multiple days) - I suspect due to the low disk availability. Search and index latency is high during this time, impacting ability to use the database.

We haven't fully mapped it out, as we aren't sure if hot/warm is even suitable with uniform storage hardware. We would likely aim for 3-4 hot nodes and 4-5 warm nodes based on desired cluster storage.

No plans to change either at this time.

Christian_Dahlqvist · December 17, 2025, 9:50pm

This is not ideal as Elasticsearch is a quite I/O intensive data store. A lot of the issues around relocating shards and rebalancing may very well be caused by poor storage performance.

If all nodes have the same slow storage I do not see any benefit of adopting a hot-warm architecture as the fewer hot nodes likely will get even more overloaded.

With the current limitations on hardware and storage I would probably recommend keeping all data nodes of the same specification, but increase storage to limit the need for rebalancing. As indexing is the most I/O intensive process I would probably also recommend adjusting the primary shard count of the indices you are actively indexing into and make sure that these are as evenly distributed across the cluster as possible. If you are indexing into multiple indices you may want to ensure that you keep the number of shards actively being indexed into low on all nodes.

The best way to resolve performance and stability issues would likely be to change to more performant storage.

algo · December 17, 2025, 11:28pm

Is there guidance on confirming this is a hardware limitation rather than a disk utilization issue?

We collect Elasticsearch monitoring metrics - can you advise on what index latency/rate values may be cause of concern or are those charts more designed to show spikes and dips?

Would this answer be different if we had SSDs instead of HDDs? Or is data tiering only beneficial if you have multiple storage media types?

Christian_Dahlqvist · December 18, 2025, 6:57am

Look at I/O metrics of the nodes like await, IOPS and disk utilisation. Not sure which of these Elasticsearch monitoring tracks but you can otherwise get this using iostat -x on Linux. Make sure to capture and analyse this when nodes are under pressure and having problems.

If you had fast storage you could choose to deploy a hot-warm architecture as a subset of the nodes likely would be able to handle the full ingest load without getting overwhelmed and the cluster suffering as a result. Given that the warm nodes also would have very fast storage this would not make much practical sense though in my opinion, so I would not recommend it in that scenario either.

Topic		Replies	Views
Sizing for ElasticStack in on-Prem environment Elasticsearch	13	314	August 25, 2025
Hardware requirements - good resilency - Elastic 8.4 Elasticsearch	5	5158	October 11, 2022
Hot - Warm Architecture with 3 Node Cluster Elasticsearch	7	504	September 8, 2020
Hot-Warm Architecture - Storage Handling Elasticsearch	4	710	September 25, 2018
Hardware profiles for Hot/Warm tiers Elasticsearch	3	272	March 19, 2024

Hot/Warm Architecture with uniform hardware?

Related topics