I am a newbie exploring Elastic Search and having hard time understanding how does Elastic Search handle data which is larger than the RAM size.
I am going through the documentations and it says to allocate 50% of your RAM size to Elastic Search. Also one of the documentation says to disable swapping.
Elasticsearch does not store all data on the heap. Instead data is read from disk when required and the heap is basically used as working memory. This is why the heap should be as most 50% of available RAM (ideally as small as the use case allows). The rest of available RAM is used for some off-heap storage and the operating system page cache, which are both essential for good performence.
Most operating systems try to use as much memory as possible for file system caches and eagerly swap out unused application memory. This can result in parts of the JVM heap or even its executable pages being swapped out to disk.
Swapping is very bad for performance, for node stability, and should be avoided at all costs. It can cause garbage collections to last for minutes instead of milliseconds and can cause nodes to respond slowly or even to disconnect from the cluster. In a resilient distributed system, it’s more effective to let the operating system kill the node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.