JVM consumed is too big when compared to actual data

Hi All,

we are doing performance testing and I am facing a peculiar issue. My cluster has around 8M docs and total store size is 2.7G. We performed testing by simulating 50 concurrent virtual users. When i look into stats Field data cache 8 mb, filter cache is 17 mb however JVM heap consumed is 10.5GB. Could you please help me out in understanding how heap can be so big when actual data is quite small.

Some stats for reference

health status index pri rep docs.count docs.deleted store.size pri.store.size
green open test_defaultindex 5 0 8802240 0 2.7gb 2.7gb

node stats

"jvm" : {
"timestamp" : 1453284261878,
"uptime_in_millis" : 616215032,
"mem" : {
"heap_used_in_bytes" : 10151601296,
"heap_used_percent" : 31,
"heap_committed_in_bytes" : 32011649024,
"heap_max_in_bytes" : 32011649024,
"non_heap_used_in_bytes" : 62484008,
"non_heap_committed_in_bytes" : 62914560,
"pools" : {
"young" : {
"used_in_bytes" : 387364552,
"max_in_bytes" : 1605304320,
"peak_used_in_bytes" : 1605304320,
"peak_max_in_bytes" : 1605304320
},
"survivor" : {
"used_in_bytes" : 4530720,
"max_in_bytes" : 200605696,
"peak_used_in_bytes" : 200605696,
"peak_max_in_bytes" : 200605696
},
"old" : {
"used_in_bytes" : 9759706024,
"max_in_bytes" : 30205739008,
"peak_used_in_bytes" : 9759706024,
"peak_max_in_bytes" : 30205739008
}
}
}

"transport" : {
"server_open" : 13,
"rx_count" : 12,
"rx_size_in_bytes" : 3928,
"tx_count" : 12,
"tx_size_in_bytes" : 3928
},
"http" : {
"current_open" : 1,
"total_opened" : 30214
}

This is ok. The JVM heap is an internal store for Java data structures. It is not proportional to the index size. Index files are read via memory mapping or from file system cache.

Thanks for your reply. Does the size of internal store data structures gets influenced by heap size allocated?. When i look at benchmark results at http://benchmarks.elasticsearch.org/ using 4 GB heap they are able to work with 6.9M short documents (log lines, total 14 GB json) without any OOME's

As I am doing all these tests for determining capacity planning for prod, how should i work to finalize JVM size based on the stats given in benchmarks testing and stats found in my testing.

Here are some thoughts for consideration:

  • JVM max heap size is a setting that should be sized in relation to the power of the node hardware (CPU cores, CPU speed) and is limited by RAM and the memory you want to assign to the file system cache
  • the more CPU cores, the higher the CPU speed, the larger you may set JVM max heap size
  • JVM must perform garbage collection (GC) which gets very expensive on huge heaps > 8g, ES will log slow GC to assist you to address such issues
  • the heap data pattern is mainly driven by your indexing/query style and workload, you can use jvisualvm to monitor this
  • as in every Java software, massive concurrency demands large heaps, but Elasticsearch is designed to scale over the number of nodes, so it does not depend strictly on having a huge heap
  • there is a bottom line of max heap size where you get OOMs but that depends on your workload
1 Like

Thank you for your response. It definitely helps