We are running our setup in Kubernetes. We found that the Service pod was getting restarted frequently with load testing and the root cause was OOM.
Observation: It does not seem memory leak as we could not found a memory leak pattern for the long-running pod. It seems more that some processes are taking a lot of Heap when getting load.
To Debug Further, we took some Heap Dump of the running pod and we found that APM was consuming more than 40% of Retained Heap, and also APM threads were in the blocked state in Thread Dump.
Attaching Heap Dump Snapshots & Required Details to you for debugging further.
113,652,168 bytes (35.77 %) of Java heap is used by 3,071 instances of java/util/concurrent/ConcurrentHashMap$Node
co/elastic/apm/agent/shaded/bytebuddy/pool/TypePool$CacheProvider$Simple at 0x7fd1b35a0
Unless you want to increase the heap size, you would need to disable type-pool caching by setting the enable_type_pool_cache config option to false. It is not a documented one, because it is very rarely changed from the default.
The way to set it would be one of the following:
through an agent config file - enable_type_pool_cache=false
as a system property in the command line: -Delastic.apm.enable_type_pool_cache=false
as an environment variable: ELASTIC_APM_ENABLE_TYPE_POOL_CACHE=false
It may make startup time longer, but based on your configured heap, I assume this is not a huge app, so it wouldn't necessarily be an issue.
Also note that the type cache is cleared after it hasn't been accessed since a minute. Usually, the type pool cache is only used on startup. After your app has warmed up, the cache should be automatically cleared. Also, the cache is referenced via SoftReferences so that they get cleared automatically if the JVM heap usage approaches the limit.
As @felixbarny noted, this cache should not cause OOM, and it should not consume any heap after some up time of the application.
We think we know why this wasn't behaving as expected. Please try to use this fix snapshot, without the enable_type_pool_cache config, and let us know if this resolves the issue.
Thanks!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.