Request for explanation of RAM DISK ratio in ES

I came across the term of ram:disk ratio when I was watching one of the webminar, but I don't really understand on why the ratio.

Previously my understanding of ES is that, for example I have a 64GB memory data node, and I had assigned 30GB out of 64GB as JVM Heap, so now I left with 34GB of memory for Lucene.

So ideally, the total storage of index for that data node shouldn't exceed 34GB as if it exceeds, there will performance hit.

But with the ratio, for example 1:24, does it means that with 34GB of RAM, my data node can support up to 816GB of indexes? Since 34GB x 24 = 816?

Thank you.

Not exactly. The non-heap memory is used by the OS to cache Lucene files. But it's not used directly by Lucene.

It'd be 30GB (being heap) x 24.

So the ratio actually is for ES JVM? Means ratio so that JVM heap won't burst from size of indexes?

It's in relation to the heap size, yes.

It's not a hard limit, you can go more, or less. It's really a starting point.

It is basically a way to specify the storage capacity of a node in relation to the size of the node measured in RAM. This typically assumes 50% of RAM is given to heap as per best practice. The ratio is typically low for search heavy use cases as this often requires data to be cached in the OS page cache. For logging use cases it can however often be much higher as the amount of data stored on a node often is limited by the heap size.

So for example the below formula which I take from one of the webminar:

Total Data Nodes = ROUNDUP(Total Storage (GB) / Memory per data node / Memory:data ratio) + 1 Data node for failover capacity

So the memory per data node should I put the full amount of RAM or the heap size? In which max value will be 30GB.

Which webinar are you referring to?

elasticsearch-sizing-and-capacity-planning

That calculation seems correct.

I have not watched that one but would expect it to use the same convention used on Elastic Cloud. There a 16GB node indicates the amount of RAM and it has 8GB heap. If it had s disk-to-memory ratio of 10 that would mean 160GB storage.

Sorry I not quite understand the meaning here, does it mean memory per data node is actually heap memory available?

No, Elasticsearch needs a good amount of off-heap memory to function well which is why it is recommended to give 50% of available RAM to the heap.

I think @zidane28 is confused because all the literature and yourself calculate the RAM:Disk ratio with the server RAM and NOT with the heap. But @warkolm said it was the heap amount that went into this ratio.

Which puts us in a pickle :slight_smile: I too would say RAM is the number used everywhere I ever saw this ratio concept referenced.

You can calculate a ratio with heap if you want but if people were to do that they would not be talking about the same thing when they have discussions.

Maybe Mark just made a mistake in the heat of the moment or else even I am confused following this thread?

To be clear the ratio can vary from like 1:8 to 1:500 depending on the usecase and always assumes ~50% of RAM is heap with max RAM of ~64GB but we still should all calculate it the same way or else it stops making any sense :wink:

The amount of heap is often what limits how much data you can store on a node and increasing off-heap memory doe not necessarily affect how much data the node can hold. It therefore makes a lot of sense relating it to heap, although this is as far as I know not the way it is generally done.

Sorry for the late reply as I'm away to a place with limited internet access.

Based on the points above, can I conclude that the max GB of indexes that a data node can support is:

30GB (Max Heap that ES can support optimally) * (ratio that you want to use) ?

So in this case, if the ratio is 1: 16, so the GB of indexes the data node can hold is 30 * 16 = 480 GB?

On Elastic Cloud you can create nodes of different sizes but the relation between CPU, RAM (and therefore heap) and storage is constant per type of node. This is where I believe the RAM-to-disk ratio came from, as it is used to describe how much storage you get allocated in relation to the size of the node. It can therefore describe how much storage a node has or just as well how much data it holds.

If you have a node with 16GB RAM (8 GB heap) and your node type is highio, which offers a RAM-to-disk ratio of 1:30, this node will have 16 * 30 GB = 480GB storage attached. This typically works as long as the heap size is 50% of allocated RAM.

Does that make sense?

Yes it does! Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.