I came across the term of ram:disk ratio when I was watching one of the webminar, but I don't really understand on why the ratio.
Previously my understanding of ES is that, for example I have a 64GB memory data node, and I had assigned 30GB out of 64GB as JVM Heap, so now I left with 34GB of memory for Lucene.
So ideally, the total storage of index for that data node shouldn't exceed 34GB as if it exceeds, there will performance hit.
But with the ratio, for example 1:24, does it means that with 34GB of RAM, my data node can support up to 816GB of indexes? Since 34GB x 24 = 816?
It is basically a way to specify the storage capacity of a node in relation to the size of the node measured in RAM. This typically assumes 50% of RAM is given to heap as per best practice. The ratio is typically low for search heavy use cases as this often requires data to be cached in the OS page cache. For logging use cases it can however often be much higher as the amount of data stored on a node often is limited by the heap size.
I have not watched that one but would expect it to use the same convention used on Elastic Cloud. There a 16GB node indicates the amount of RAM and it has 8GB heap. If it had s disk-to-memory ratio of 10 that would mean 160GB storage.
I think @zidane28 is confused because all the literature and yourself calculate the RAM:Disk ratio with the server RAM and NOT with the heap. But @warkolm said it was the heap amount that went into this ratio.
Which puts us in a pickle I too would say RAM is the number used everywhere I ever saw this ratio concept referenced.
You can calculate a ratio with heap if you want but if people were to do that they would not be talking about the same thing when they have discussions.
Maybe Mark just made a mistake in the heat of the moment or else even I am confused following this thread?
To be clear the ratio can vary from like 1:8 to 1:500 depending on the usecase and always assumes ~50% of RAM is heap with max RAM of ~64GB but we still should all calculate it the same way or else it stops making any sense
The amount of heap is often what limits how much data you can store on a node and increasing off-heap memory doe not necessarily affect how much data the node can hold. It therefore makes a lot of sense relating it to heap, although this is as far as I know not the way it is generally done.
On Elastic Cloud you can create nodes of different sizes but the relation between CPU, RAM (and therefore heap) and storage is constant per type of node. This is where I believe the RAM-to-disk ratio came from, as it is used to describe how much storage you get allocated in relation to the size of the node. It can therefore describe how much storage a node has or just as well how much data it holds.
If you have a node with 16GB RAM (8 GB heap) and your node type is highio, which offers a RAM-to-disk ratio of 1:30, this node will have 16 * 30 GB = 480GB storage attached. This typically works as long as the heap size is 50% of allocated RAM.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.