Memory issue(possibly non-heap memory)


(abhii-2) #1

I have been trying to diagnose a memory issue we are having on our
production servers for the past couple of days. One of the current
suspects is an elastic related issue and I need a little help to
confirm or reject this hypothesis.

The symptoms are as follows(version details at bottom of post). The
memory usage on our Tomcat server machine grows steadily without limit
until all memory, both physical and swap, is exhausted, and the OS
dies. The memory usage growth is all from the single tomcat server
process. The fact that memory consumption grows beyond the max heap
size points to non-heap memory growth.

The only seemingly relevant logs(currently on INFO level) are
repeated blocks of the following:

2011-06-20 12:30:18,712 [New I/O server worker #1-6] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,715 [New I/O server worker #1-5] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,717 [New I/O server worker #1-8] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,727 [New I/O server worker #1-5] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,764 [New I/O server worker #1-4] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,770 [New I/O server worker #1-3] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,780 [New I/O server worker #1-1] WARN -
[Stacy, George] received ping response with no matching id [84043]
2011-06-20 12:30:18,784 [New I/O server worker #1-2] WARN -
[Stacy, George] received ping response with no matching id [84043]
2011-06-20 12:30:18,793 [New I/O server worker #1-5] WARN -
[Stacy, George] received ping response with no matching id [84043]

This happens periodically and seems to match the growth of memory
usage(this statement needs to be verified but seems to be correct).
It seems to happen more often initially just after the server has
started and then at longer intervals.

Since Java NIO could allocate non-heap memory (e.g. direct ByteBuffer)
I am wondering if anyone could confirm that this could actually be the
cause of our leak.

PLEASE NOTE: This is on our Tomcat server machine not on our Elastic
server machine.

The Tomcat startup command is:
/usr/lib/jvm/java-6-sun/bin/java -Djava.util.logging.config.file=/var/
lib/tomcat6/conf/logging.properties -Djava.awt.headless=true -server -
Xss1M -Xms2G -Xmx3G -XX:NewSize=1G -XX:MaxPermSize=768M -XX:
+UseConcMarkSweepGC -XX:+CMSIncrementalMode -
XX:CMSInitiatingOccupancyFraction=80 -XX:+UseConcMarkSweepGC -
Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -
Djava.endorsed.dirs=/usr/share/tomcat6/endorsed -classpath /usr/share/
tomcat6/bin/bootstrap.jar -Dcatalina.base=/var/lib/tomcat6 -
Dcatalina.home=/usr/share/tomcat6 -Djava.io.tmpdir=/tmp/tomcat6-tmp
org.apache.catalina.startup.Bootstrap start

The machine has 6Gb physical memory and a 700Mb swap partition. As
you can see the -Xmx is 3G.

We are using:
Ubuntu 2.6.32-30-server 64-bit
Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM)
64-Bit Server VM (build 19.1-b02, mixed mode)
Elastic Search 0.16.2
Tomcat 6.0.24

Thanks a bunch.


(Shay Banon) #2

Are you using Node client in the tomcat app? Which discovery are you using if this is the case, multicast or unicast?

On Wednesday, June 22, 2011 at 12:01 AM, abhii wrote:

I have been trying to diagnose a memory issue we are having on our
production servers for the past couple of days. One of the current
suspects is an elastic related issue and I need a little help to
confirm or reject this hypothesis.

The symptoms are as follows(version details at bottom of post). The
memory usage on our Tomcat server machine grows steadily without limit
until all memory, both physical and swap, is exhausted, and the OS
dies. The memory usage growth is all from the single tomcat server
process. The fact that memory consumption grows beyond the max heap
size points to non-heap memory growth.

The only seemingly relevant logs(currently on INFO level) are
repeated blocks of the following:

2011-06-20 12:30:18,712 [New I/O server worker #1-6] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,715 [New I/O server worker #1-5] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,717 [New I/O server worker #1-8] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,727 [New I/O server worker #1-5] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,764 [New I/O server worker #1-4] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,770 [New I/O server worker #1-3] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,780 [New I/O server worker #1-1] WARN -
[Stacy, George] received ping response with no matching id [84043]
2011-06-20 12:30:18,784 [New I/O server worker #1-2] WARN -
[Stacy, George] received ping response with no matching id [84043]
2011-06-20 12:30:18,793 [New I/O server worker #1-5] WARN -
[Stacy, George] received ping response with no matching id [84043]

This happens periodically and seems to match the growth of memory
usage(this statement needs to be verified but seems to be correct).
It seems to happen more often initially just after the server has
started and then at longer intervals.

Since Java NIO could allocate non-heap memory (e.g. direct ByteBuffer)
I am wondering if anyone could confirm that this could actually be the
cause of our leak.

PLEASE NOTE: This is on our Tomcat server machine not on our Elastic
server machine.

The Tomcat startup command is:
/usr/lib/jvm/java-6-sun/bin/java -Djava.util.logging.config.file=/var/
lib/tomcat6/conf/logging.properties -Djava.awt.headless=true -server -
Xss1M -Xms2G -Xmx3G -XX:NewSize=1G -XX:MaxPermSize=768M -XX:
+UseConcMarkSweepGC -XX:+CMSIncrementalMode -
XX:CMSInitiatingOccupancyFraction=80 -XX:+UseConcMarkSweepGC -
Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -
Djava.endorsed.dirs=/usr/share/tomcat6/endorsed -classpath /usr/share/
tomcat6/bin/bootstrap.jar -Dcatalina.base=/var/lib/tomcat6 -
Dcatalina.home=/usr/share/tomcat6 -Djava.io.tmpdir=/tmp/tomcat6-tmp
org.apache.catalina.startup.Bootstrap start

The machine has 6Gb physical memory and a 700Mb swap partition. As
you can see the -Xmx is 3G.

We are using:
Ubuntu 2.6.32-30-server 64-bit
Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM)
64-Bit Server VM (build 19.1-b02, mixed mode)
Elastic Search 0.16.2
Tomcat 6.0.24

Thanks a bunch.


(abhii-2) #3

The relevant code is

NodeBuilder nodeBuilder = NodeBuilder.nodeBuilder();
Settings settings =
nodeBuilder.settings().loadFromClasspath("elasticsearch.yml").build();
this.elasticNode = nodeBuilder.settings(settings).client(true).node();
this.client = this.elasticNode.client();

and the elastic.yml is ...

cluster:
name: ssga-app.intrepid.agile

That would default to multicast right?

On Thu, Jun 23, 2011 at 4:41 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Are you using Node client in the tomcat app? Which discovery are you
using if this is the case, multicast or unicast?

On Wednesday, June 22, 2011 at 12:01 AM, abhii wrote:

I have been trying to diagnose a memory issue we are having on our
production servers for the past couple of days. One of the current
suspects is an elastic related issue and I need a little help to
confirm or reject this hypothesis.

The symptoms are as follows(version details at bottom of post). The
memory usage on our Tomcat server machine grows steadily without limit
until all memory, both physical and swap, is exhausted, and the OS
dies. The memory usage growth is all from the single tomcat server
process. The fact that memory consumption grows beyond the max heap
size points to non-heap memory growth.

The only seemingly relevant logs(currently on INFO level) are
repeated blocks of the following:

2011-06-20 12:30:18,712 [New I/O server worker #1-6] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,715 [New I/O server worker #1-5] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,717 [New I/O server worker #1-8] WARN -
[Destiny] received ping response with no matching id [84020]
2011-06-20 12:30:18,727 [New I/O server worker #1-5] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,764 [New I/O server worker #1-4] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,770 [New I/O server worker #1-3] WARN -
[Kala] received ping response with no matching id [84009]
2011-06-20 12:30:18,780 [New I/O server worker #1-1] WARN -
[Stacy, George] received ping response with no matching id [84043]
2011-06-20 12:30:18,784 [New I/O server worker #1-2] WARN -
[Stacy, George] received ping response with no matching id [84043]
2011-06-20 12:30:18,793 [New I/O server worker #1-5] WARN -
[Stacy, George] received ping response with no matching id [84043]

This happens periodically and seems to match the growth of memory
usage(this statement needs to be verified but seems to be correct).
It seems to happen more often initially just after the server has
started and then at longer intervals.

Since Java NIO could allocate non-heap memory (e.g. direct ByteBuffer)
I am wondering if anyone could confirm that this could actually be the
cause of our leak.

PLEASE NOTE: This is on our Tomcat server machine not on our Elastic
server machine.

The Tomcat startup command is:
/usr/lib/jvm/java-6-sun/bin/java -Djava.util.logging.config.file=/var/
lib/tomcat6/conf/logging.properties -Djava.awt.headless=true -server -
Xss1M -Xms2G -Xmx3G -XX:NewSize=1G -XX:MaxPermSize=768M -XX:
+UseConcMarkSweepGC -XX:+CMSIncrementalMode -
XX:CMSInitiatingOccupancyFraction=80 -XX:+UseConcMarkSweepGC -
Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -
Djava.endorsed.dirs=/usr/share/tomcat6/endorsed -classpath /usr/share/
tomcat6/bin/bootstrap.jar -Dcatalina.base=/var/lib/tomcat6 -
Dcatalina.home=/usr/share/tomcat6 -Djava.io.tmpdir=/tmp/tomcat6-tmp
org.apache.catalina.startup.Bootstrap start

The machine has 6Gb physical memory and a 700Mb swap partition. As
you can see the -Xmx is 3G.

We are using:
Ubuntu 2.6.32-30-server 64-bit
Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM)
64-Bit Server VM (build 19.1-b02, mixed mode)
Elastic Search 0.16.2
Tomcat 6.0.24

Thanks a bunch.


(abhii-2) #4

A few additional pieces of relevant information ...

The java code pasted above is in the init() method of a Spring bean. There
is no other code in this init method.

Since my initial post I have been able to reproduce the fact that the ping
response logs start appearing when the non-heap memory starts to creep up.
To clarify, the server will be doing fine for hours and then the ping
response logs appear and the non-heap memory starts creeping up from then
on. However there seems to be no clear correlation between the number and
timing of such log lines and the increase in memory. Of course, I am only
stating that there is a clear correlation between the start of the memory
creep and the first appearance of these ping response logs, but I have no
idea regarding any causal link between the two.

Further, there number of threads for the tomcat process is relatively
static, so this does not relate to runaway thread creation.


(Shay Banon) #5

I don't know why the non heap is increasing, but maybe things start to garbage collect poorly causing for things to slow down. Since its running in Tomcat with "other" code, not sure if the problem is in ES or not...

On Friday, June 24, 2011 at 12:22 AM, Abhijit Inamdar wrote:

A few additional pieces of relevant information ...

The java code pasted above is in the init() method of a Spring bean. There is no other code in this init method.

Since my initial post I have been able to reproduce the fact that the ping response logs start appearing when the non-heap memory starts to creep up. To clarify, the server will be doing fine for hours and then the ping response logs appear and the non-heap memory starts creeping up from then on. However there seems to be no clear correlation between the number and timing of such log lines and the increase in memory. Of course, I am only stating that there is a clear correlation between the start of the memory creep and the first appearance of these ping response logs, but I have no idea regarding any causal link between the two.

Further, there number of threads for the tomcat process is relatively static, so this does not relate to runaway thread creation.


(kelaban) #6

I see this is a fairly old thread, but I was wondering if you ever found a fix? I am currently seeing a similar issue that is eventually leading to a OutOfMemoryError. The only difference is I am using the TransportClient


(Jérôme Gagnon) #7

I don't know the thread you are referring to, but are you using scoring
scripts ? This is what caused me many headache due to OOM errors... I think
there was a commit for that a while ago, it depends on which version you
are running.

On Friday, March 15, 2013 12:39:53 AM UTC-4, Keith L wrote:

I see this is a fairly old thread, but I was wondering if you ever found a
fix? I am currently seeing a similar issue that is eventually leading to a
OutOfMemoryError. The only difference is I am using the TransportClient

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Memory-issue-possibly-non-heap-memory-tp3092945p4031729.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(acv2) #8

Hello Guys, any update on this? I'm having a similar issue i got it well documented here, but i cant figure why non heap is growing until it gets flushed and then stops everything else for some time.

here i have everything explained if you can take a look it would be great!

http://stackoverflow.com/questions/36103454/elasticsearch-overloading-after-non-heap-mem-flush


(system) #9