Heap consumption


(Kiryam) #1

Need help with memory consumption.

This is our cluster:

On start (about 3 or more hours) we have green line (heap memory usage). After two or more houre we can see red line (heap memory usage)
2015-10-30 17-02-58

After that node crased:
GC prints messages in logs like this.

[2015-10-28 20:37:42,134][WARN ][monitor.jvm              ] [process8] [gc][old][1312][37] duration [21s], collections [1]/[21.7s], total [21s]/[7.3m], memory [19.4gb]->[18.1gb]/[19.9gb], all_pools {[young] [231.3mb]->[114.6mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [19.1gb]->[18gb]/[19.3gb]}
[2015-10-28 20:38:07,962][WARN ][monitor.jvm              ] [process8] [gc][old][1315][38] duration [23.2s], collections [1]/[23.7s], total [23.2s]/[7.7m], memory [19.2gb]->[18gb]/[19.9gb], all_pools {[young] [8.5mb]->[16.1mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [19.1gb]->[18gb]/[19.3gb]}
[2015-10-28 20:38:34,725][WARN ][monitor.jvm              ] [process8] [gc][old][1322][39] duration [20.6s], collections [1]/[20.7s], total [20.6s]/[8m], memory [19.7gb]->[18.1gb]/[19.9gb], all_pools {[young] [479.9mb]->[60.6mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [19.2gb]->[18gb]/[19.3gb]}
[2015-10-28 20:39:00,554][WARN ][monitor.jvm              ] [process8] [gc][old][1325][40] duration [23.1s], collections [1]/[23.8s], total [23.1s]/[8.4m], memory [19.4gb]->[18.1gb]/[19.9gb], all_pools {[young] [491.6mb]->[87.7mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [18.9gb]->[18gb]/[19.3gb]}
[2015-10-28 20:39:25,743][WARN ][monitor.jvm              ] [process8] [gc][old][1333][41] duration [18s], collections [1]/[18.1s], total [18s]/[8.7m], memory [19.5gb]->[18.1gb]/[19.9gb], all_pools {[young] [461.7mb]->[49.5mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [19gb]->[18gb]/[19.3gb]}
[2015-10-28 20:39:52,107][WARN ][monitor.jvm              ] [process8] [gc][old][1343][42] duration [16.9s], collections [1]/[17.3s], total [16.9s]/[9m], memory [19.4gb]->[18.1gb]/[19.9gb], all_pools {[young] [429.8mb]->[65.4mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [18.9gb]->[18gb]/[19.3gb]}
[2015-10-28 20:40:19,758][WARN ][monitor.jvm              ] [process8] [gc][old][1349][43] duration [22.3s], collections [1]/[22.6s], total [22.3s]/[9.4m], memory [19.2gb]->[18.1gb]/[19.9gb], all_pools {[young] [265.9mb]->[87.5mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [18.9gb]->[18gb]/[19.3gb]}
[2015-10-28 20:40:44,749][WARN ][monitor.jvm              ] [process8] [gc][old][1352][44] duration [22.7s], collections [1]/[22.9s], total [22.7s]/[9.7m], memory [19.2gb]->[18gb]/[19.9gb], all_pools {[young] [284.5mb]->[5.2mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [18.9gb]->[18gb]/[19.3gb]}
[2015-10-28 21:57:08,889][INFO ][monitor.jvm              ] [process8] [gc][old][5920][279] duration [15.1s], collections [2]/[15.7s], total [15.1s]/[10.3m], memory [19.3gb]->[18.5gb]/[19.9gb], all_pools {[young] [209.6mb]->[70.2mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [19gb]->[18.4gb]/[19.3gb]}

We have some custom settings in elasticsearch.yml

bootstrap.mlockall: true
index.query.bool.max_clause_count: 100000
threadpool.bulk.queue_size: 300
http.max_content_length: 500mb
transport.tcp.compress: true
index.load_fixed_bitset_filters_eagerly: false

Here screenshot of dump analyzer (we have memory dump too 26GB )

Class Name Shallow Heap Retained Heap Percentage
org.elasticsearch.common.cache.LocalCache$LocalManualCache @ 0xdee6ad78 16 11,148,875,400 42.23%


(Mark Walkom) #2

And what do you want help with?
Because your GC is keeping below 30 seconds, so are you having other issues?


(Kiryam) #3

After heap is full - node down (slow response and after kicks from cluster).
Now we try to use G1 garbage collector. Seems its help


(Mike Simos) #4

Hi,

You should check to see how much field data and segment memory are being used since you're running out of heap:

GET /_nodes/stats?human

Seems like you're getting a lot of memory pressure and java can't free up enough memory. I am not sure but LocalManualCache could be related to field data. If the above API shows most of your heap being used for field data then I'd recommend using doc values:

https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html

Otherwise as a temporary workaround until you implement doc values is to set indices.fielddata.cache.size to something like 30% or whatever value which stops the node from OOM:

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-fielddata.html#modules-fielddata


(system) #5