JVM consumed is too big when compared to actual data

SKumarMN · January 20, 2016, 10:37am

Hi All,

we are doing performance testing and I am facing a peculiar issue. My cluster has around 8M docs and total store size is 2.7G. We performed testing by simulating 50 concurrent virtual users. When i look into stats Field data cache 8 mb, filter cache is 17 mb however JVM heap consumed is 10.5GB. Could you please help me out in understanding how heap can be so big when actual data is quite small.

Some stats for reference

health status index pri rep docs.count docs.deleted store.size pri.store.size
green open test_defaultindex 5 0 8802240 0 2.7gb 2.7gb

node stats

"jvm" : {
"timestamp" : 1453284261878,
"uptime_in_millis" : 616215032,
"mem" : {
"heap_used_in_bytes" : 10151601296,
"heap_used_percent" : 31,
"heap_committed_in_bytes" : 32011649024,
"heap_max_in_bytes" : 32011649024,
"non_heap_used_in_bytes" : 62484008,
"non_heap_committed_in_bytes" : 62914560,
"pools" : {
"young" : {
"used_in_bytes" : 387364552,
"max_in_bytes" : 1605304320,
"peak_used_in_bytes" : 1605304320,
"peak_max_in_bytes" : 1605304320
},
"survivor" : {
"used_in_bytes" : 4530720,
"max_in_bytes" : 200605696,
"peak_used_in_bytes" : 200605696,
"peak_max_in_bytes" : 200605696
},
"old" : {
"used_in_bytes" : 9759706024,
"max_in_bytes" : 30205739008,
"peak_used_in_bytes" : 9759706024,
"peak_max_in_bytes" : 30205739008
}
}
}

"transport" : {
"server_open" : 13,
"rx_count" : 12,
"rx_size_in_bytes" : 3928,
"tx_count" : 12,
"tx_size_in_bytes" : 3928
},
"http" : {
"current_open" : 1,
"total_opened" : 30214
}

jprante · January 20, 2016, 12:29pm

This is ok. The JVM heap is an internal store for Java data structures. It is not proportional to the index size. Index files are read via memory mapping or from file system cache.

SKumarMN · January 20, 2016, 3:35pm

Thanks for your reply. Does the size of internal store data structures gets influenced by heap size allocated?. When i look at benchmark results at http://benchmarks.elasticsearch.org/ using 4 GB heap they are able to work with 6.9M short documents (log lines, total 14 GB json) without any OOME's

As I am doing all these tests for determining capacity planning for prod, how should i work to finalize JVM size based on the stats given in benchmarks testing and stats found in my testing.

jprante · January 20, 2016, 7:49pm

Here are some thoughts for consideration:

JVM max heap size is a setting that should be sized in relation to the power of the node hardware (CPU cores, CPU speed) and is limited by RAM and the memory you want to assign to the file system cache
the more CPU cores, the higher the CPU speed, the larger you may set JVM max heap size
JVM must perform garbage collection (GC) which gets very expensive on huge heaps > 8g, ES will log slow GC to assist you to address such issues
the heap data pattern is mainly driven by your indexing/query style and workload, you can use jvisualvm to monitor this
as in every Java software, massive concurrency demands large heaps, but Elasticsearch is designed to scale over the number of nodes, so it does not depend strictly on having a huge heap
there is a bottom line of max heap size where you get OOMs but that depends on your workload

SKumarMN · January 21, 2016, 10:55am

Thank you for your response. It definitely helps

Topic		Replies	Views
JVM > 90% - Small indexes , High Shards Elasticsearch	6	955	July 5, 2017
Used heap behaves wired, JVM Heapsize problems Elasticsearch	10	507	August 20, 2020
JVM heap size usage and causes Elasticsearch	9	1939	September 25, 2019
Causes of High JVM in Data and Master nodes Elasticsearch	7	1964	August 14, 2019
What occupies my heap Elasticsearch	6	908	March 27, 2017

JVM consumed is too big when compared to actual data

Related topics