Kibana 4.4 memory leak and getting wack by OOM


(Michael) #1

Restarted Kibana 4.4, with ES 2.2 fresh on RHEL6 vmware installation with 4gigs of ram. No browser accessing kibana. Nothing is talking to kibana. node_options is set to --max-old-space-size=512, but kibana does not seem to be keeping memory at 512M.

Current "ps aux|grep node"
ps aux|grep node
kibana 11978 0.8 15.7 1477388 618512 pts/0 Sl 09:34 0:32 /opt/kibana/bin/../node/bin/node --max-old-space-size=512 /opt/kibana/bin/../src/cli

Small time later(900M):
#ps aux|grep node
kibana 11978 0.7 23.9 1869632 936600 pts/0 Sl 09:34 0:54 /opt/kibana/bin/../node/bin/node --max-old-space-size=512 /opt/kibana/bin/../src/cli

As you can see its using ~600 M of memory. And growing. Eventually it will run this box out of memory and OOM will kill it.

Why isn't kibana's node.js respecting the --max-old-space=512 parameter?

Is there a fix for this issue? I don't want to have to restart Kibana every few hours to try to keep it from running the vm out of ram.

Thanks,
Mike


(Spencer Alger) #2

from: https://github.com/nodejs/node/issues/2738#issuecomment-148784686

Keep in mind that --max-old-space-size specifies the heap limit for the v8 JS engine that powers Node.js. This doesn't include all the memory the process might be using, such as buffers (for example, if you load very large images or JSON files). OSX will show you the entire memory usage of the node process in its activity monitor. Use process.memoryUsage() programatically to see heap memory usage.

Surprising to hear that it doesn't seem to run garbage collection at all. Heap allocation is done incrementally... it should run GC's before allocating more heap. Can you use --trace-gc flag to identify when GC is being run?

Maybe that explains it? I'm confident that node.js is respecting the argument as expecting.


(Michael) #3

added the --trace-gc into the node_options, restarting kibana now. Its at 1.4 gig and the vm is about of of memory. I'll update more once i get kibana restarted.

update:
currently setting the node_options to
--expose-gc --trace-gc --trace-gc-verbose --max-old-space-size=200

just to see what happens at 200M


(Michael) #4

Not sure how its calculating things:

ps aux|grep node
kibana 17468 0.9 11.7 1314596 459236 pts/0 Sl 13:20 0:25 /opt/kibana/bin/../node/bin/node --expose-gc --trace-gc --trace-gc-verbose --max-old-space-size=200 /opt/kibana/bin/../src/cli

so as you can see the setting is 200M, the size of node is currently ~ 450M

the stdout file shows:

[17468]  2514028 ms: Scavenge 120.5 (162.5) -> 111.5 (165.5) MB, 13.6 ms [allocation failure].
[17468] Memory allocator,   used: 169444 KB, available: 100892 KB
[17468] New space,          used:   3467 KB, available:  12916 KB, committed:  32768 KB
[17468] Old pointers,       used:  70437 KB, available:      0 KB, committed:  72446 KB
[17468] Old data space,     used:  27191 KB, available:    308 KB, committed:  28409 KB
[17468] Code space,         used:  10659 KB, available:  11011 KB, committed:  21912 KB
[17468] Map space,          used:   2252 KB, available:   5696 KB, committed:   8190 KB
[17468] Cell space,         used:    110 KB, available:     12 KB, committed:    128 KB
[17468] PropertyCell space, used:     56 KB, available:      7 KB, committed:     64 KB
[17468] Large object space, used:      0 KB, available:  99851 KB, committed:      0 KB
[17468] All spaces,         used: 114175 KB, available:  29952 KB, committed: 163917 KB
[17468] External memory reported:   -188 KB
[17468] Total time spent in GC  : 447.7 ms
[17468]  2654927 ms: Scavenge 124.2 (165.5) -> 115.2 (168.5) MB, 12.9 ms [allocation failure].
[17468] Memory allocator,   used: 172516 KB, available:  97820 KB
[17468] New space,          used:   3474 KB, available:  12909 KB, committed:  32768 KB
[17468] Old pointers,       used:  73505 KB, available:      0 KB, committed:  75469 KB
[17468] Old data space,     used:  27778 KB, available:    149 KB, committed:  28409 KB
[17468] Code space,         used:  10698 KB, available:  10971 KB, committed:  21912 KB
[17468] Map space,          used:   2321 KB, available:   5628 KB, committed:   8190 KB
[17468] Cell space,         used:    110 KB, available:     12 KB, committed:    128 KB
[17468] PropertyCell space, used:     56 KB, available:      7 KB, committed:     64 KB
[17468] Large object space, used:      0 KB, available:  96779 KB, committed:      0 KB
[17468] All spaces,         used: 117945 KB, available:  29679 KB, committed: 166940 KB
[17468] External memory reported:   -201 KB
[17468] Total time spent in GC  : 460.6 ms

And it thinks there is space in the "old data space remaining"


(system) #5