Anyone else seeing memory issues with java 121?

Topic...I had no issues with java 111, but with 121 I'm getting oom-killers :frowning: Anyone else?

What issues exactly - Do we have a particular stack trace inline with a particular version?

Not sure what to post really...here's what I have via syslog (pastbin link due to size):

Java oom

Again, I've been running fine since October with java 111..so my guess is java is leaking somewhere. Thank you.

What version of Elasticsearch?

Latest...5.1.1.

I'd heard (annecdata) that 121 was a lot more efficient.

Hrmm....well I'll see if it happens again...twice so far with the same results in syslog.

Synopsis
The logs provided are from 22/1 - is this a rare occurrence?
There is mention of GNU Krell Monitors running - Are you facing any issues at the operating system such as freezing applications?

Is this running on an iMac or is the iMac the host used to run Ubuntu 14.04.1 (desktop?) running kernel version 4.4.0-59-generic?

Yea that log was the first time I'd ever seen the oom-killer truth be told. No other issues...just the sudden killing of that java pid. This is running Ubuntu 14.04.1 64 bit on an old imac.:

Linux 4.4.0-59-generic #80~14.04.1-Ubuntu SMP Fri Jan 6 18:02:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Thank you.

Please check your free memory on the server and ElasticSearch heap settings XmX and etc.
If you have not enough memory for Jvm, system will kill jvm.

Thanks...my question though is if anyone else has noticed this, since a java upgrade. I didn't have this issue previously...if I had then yes..I would agree system memory would be the issue, but in this case I'm seeing this after upgrading from java 111 to java 121, but never before the upgrade. Thanks again.

This is the first I am hearing of this on Java _121.
Be it that this is on an older iMac running an Ubuntu desktop (Can you confirm server or desktop?)

Leading on from there we would be unaware of any other operating system level conflicts that could be occurring related to the modification to Java. For example, eGNU Krell Monitors running in a not tainted manner.

This is Ubuntu server 14.04. Thank you....might have been just a fluke....I'll continue monitoring and post results.

We're seeing frequent oom kills for our cluster nodes as well, we had cluster halts before 121 (because nodes got stuck in a GC loop [reporting to the master but failing all queries]), now with 121 the nodes at least correctly terminate.

Linux es-big-16 4.4.0-59-generic #80~14.04.1-Ubuntu SMP Fri Jan 6 18:02:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
oom kern log

We already reduced the queries cache size to 5%, the index buffer to 4% and limited the fielddata cache to 50% (with the breaker set to 57%), we are still trying to find the cause of our issues, maybe our bulk queue size is too long (100k - it's hardly used, though) or our bulk requests are too big or we are doing too many requests per second / node... fishing in muddy waters.

We had no issues on 2.4.x, it all started after upgrading to 5.1.1, 5.1.2 didn't help either.

With the versions reported in this thread (4.4.0-59), this is due to an Ubuntu kernel bug. You should downgrade the kernel to 4.4.0-57.

1 Like

D'oh, thanks for the info - I also just noticed that our jvm.options was untouched since 5.1.1, I added the missing netty parameter for the gc deactivation and restartet all nodes. Will post an update tomorrow.

Yea legit thanks for the info...I can wait until the kernel fix.

What are your current options Andre for jvm.options? Thanks.

Check if the following parameter exists in your jvm.options (on a side note: is there a path where we can store our customized values to avoid sed-fiddling within the jvm.options?):

-Dio.netty.recycler.maxCapacityPerThread=0

We lost 4 of our nodes over the night, so it probably is indeed the kernel bug. The nodes running on the proposed 4.4.0-62 didn't crash.

On another side note: does anybody notice performance regressions on 14.04/16.04 with the 4.4 kernel? Our nodes running on the wily 4.2 kernel are performing 20-30% better (less cpu usage, less GC times, faster index and search times, load average) and still at least 10% better than our 16.04 nodes...

We upgraded to 5.2 and I added additional nodes to the proposed-kernel pool. The downgraded kernel nodes are looking good, but we lost one node in the 4.4.0-62 kernel pool, so it's probably not fixed yet sigh (the last comment in the kernel bug discussion also indicates that).

I guess we should have stayed with CentOS... :slight_smile: