we are doing performance testing and I am facing a peculiar issue. My cluster has around 8M docs and total store size is 2.7G. We performed testing by simulating 50 concurrent virtual users. When i look into stats Field data cache 8 mb, filter cache is 17 mb however JVM heap consumed is 10.5GB. Could you please help me out in understanding how heap can be so big when actual data is quite small.
Some stats for reference
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open test_defaultindex 5 0 8802240 0 2.7gb 2.7gb
This is ok. The JVM heap is an internal store for Java data structures. It is not proportional to the index size. Index files are read via memory mapping or from file system cache.
Thanks for your reply. Does the size of internal store data structures gets influenced by heap size allocated?. When i look at benchmark results at http://benchmarks.elasticsearch.org/ using 4 GB heap they are able to work with 6.9M short documents (log lines, total 14 GB json) without any OOME's
As I am doing all these tests for determining capacity planning for prod, how should i work to finalize JVM size based on the stats given in benchmarks testing and stats found in my testing.
JVM max heap size is a setting that should be sized in relation to the power of the node hardware (CPU cores, CPU speed) and is limited by RAM and the memory you want to assign to the file system cache
the more CPU cores, the higher the CPU speed, the larger you may set JVM max heap size
JVM must perform garbage collection (GC) which gets very expensive on huge heaps > 8g, ES will log slow GC to assist you to address such issues
the heap data pattern is mainly driven by your indexing/query style and workload, you can use jvisualvm to monitor this
as in every Java software, massive concurrency demands large heaps, but Elasticsearch is designed to scale over the number of nodes, so it does not depend strictly on having a huge heap
there is a bottom line of max heap size where you get OOMs but that depends on your workload
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.