Volume Sizing Example

Dear Elastic

In your webinar, you gave the following formula for volume sizing:

I want to just check with a worked example.

so for 1.5TB retained for 7 days with one replica Total Data = 1.5 *7 *2 = 22TB.

Total Storage = 22 TB *1.20 = 26.4TB

Nodes have 64GB ram,

Total Data Nodes = 26400 /64 / 30 = 13.75 +1 = 15 nodes.

Here is my question that was not clear in the slide, the ideal memory:data ratio was shown to be 1:30 for a hot node and was explained that the maximum storage per node was 64 x 30 = 1920 GB or 1.92 TB.

What happens if I increase storage to have fewer nodes for example what if I only had three nodes with 26.4TB of storage compared to 15 nodes of 1.92 TB what would be the effect?

Could someone please clarify, the memory:data ratio portion of the calculation in the example above for me?

Kind Regards

Magneton

Each node has a certain amount of resources available, e.g. CPU, heap space and disk I/O capacity. Indexing, querying as well as just storing data use resources, and therefore compete for resources with eachother.

Indexing is a very disk I/O intensive process but can also use a lot of CPU and need a good amount of heap space as well. Querying also uses the same resources and the amount of resources required depend on the required query latency as well as the amount of data queried. Just storing data on a node typically just consume heap.

This is based on empirical observations and is what a lot of users use for hot nodes. Holding relatively little data means querying and storage require less resources, which leaves more for indexing. It is all about finding a good balance between indexing, storage and querying.

If you have 3 nodes instead of 15, each node will need to index 5 times as much. They will also store 5 times as much data and handle querying for 5 times the data volume. At the ingest volumes and retention period you mentioned I suspect you will run into resource limitations at a very early stage.

You can also have a look at this webinar, which talks about heap usage and how to optimize it.

Yes, the ratio is often given relative the total RAM available to the node, assuming 50% is given to heap, which in practice generally means at most 64GB RAM.

This blog post might also be useful.

Cheers Christian