Volume Sizing Example

Christian_Dahlqvist · November 18, 2019, 5:56pm

Each node has a certain amount of resources available, e.g. CPU, heap space and disk I/O capacity. Indexing, querying as well as just storing data use resources, and therefore compete for resources with eachother.

Indexing is a very disk I/O intensive process but can also use a lot of CPU and need a good amount of heap space as well. Querying also uses the same resources and the amount of resources required depend on the required query latency as well as the amount of data queried. Just storing data on a node typically just consume heap.

This is based on empirical observations and is what a lot of users use for hot nodes. Holding relatively little data means querying and storage require less resources, which leaves more for indexing. It is all about finding a good balance between indexing, storage and querying.

If you have 3 nodes instead of 15, each node will need to index 5 times as much. They will also store 5 times as much data and handle querying for 5 times the data volume. At the ingest volumes and retention period you mentioned I suspect you will run into resource limitations at a very early stage.

You can also have a look at this webinar, which talks about heap usage and how to optimize it.

Topic		Replies	Views
Elastic Memory:Storage Ratio / Hot-warm Elasticsearch	3	4911	September 23, 2020
Storage to RAM Elasticsearch	1	248	January 30, 2025
Calculating Number of Data nodes Elasticsearch	5	3415	September 28, 2019
Maximum RAM recommended for data node Elasticsearch	6	5864	February 11, 2020
Data node calculation based on "Number of shards per node below 20 per GB heap it has configured" Elasticsearch	8	2334	September 6, 2022

Volume Sizing Example

Related topics