Elastic-apm-agent jdbcHelper seems to use a lot of memory

Firstly, I'm not sure how i replicate this problem apologies in advance...

Looking into memory usage on an app at the moment because it seems to be using a bunch more memory than it should be...

Drilling in with the profiling tool, the main usage seems to be coming from a ConcurrentHashMap in co.elastic.apm.agent.jdbc.helper.jdbcHelper.

Inside that class is a static list maintaining all sql ever run by the looks...

This list will grow and grow and none of the referenced objects can ever be garbage collected (referenced by a static object), right?

If there was a TTL or something on the list, or something removed queries at some stage, i think this would help the memory footprint a lot. Our current co.elastic.apm.agent.jdbc.helper.JdbcHelper retains nearly 8GB of memory...

APM Agent language and version:
java - 1.8.0

** $ java -version

java version "1.8.0_161"

Java(TM) SE Runtime Environment (build 1.8.0_161-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

Hi Paul and welcome to the APM forum :slight_smile:

This problem you report is troubling...

A short explanation: the JdbcHelper holds a WeakConcurrentMap, not a ConcurrentHashMap. The difference is that the former uses weakly-referenced keys, which means it should not prevent Statements from being garbage collected. We don't actively remove entries from this map, as we rely on the map to remove them once (or shortly after) the Statement objects (used as keys) are GCed.

In order to explore what happens there further, it will be useful to get some more info:

Just to make sure- does this mean it behaves differently with and without agent? Did you see a difference between former agent version and 1.8.0?

There are generally two ways to explain these symptoms: either there is a bug in the map implementation or the way we use it, or something other than the map is causing this statement leak, and you see that in the map, because the statements do not get GCed.

Using your heap-analysis tool, please try to look at map entries of statements that shouldn't be there (meaning - should have been cleared already) and see:

  1. The key should be of type com.blogspot.mydailyjava.weaklockfree.WeakConcurrentMap.WeakKey, which extends WeakReference. Is the referent field of the key null?
  2. If referent is not null, try to see what else if referencing it. That would be the most useful thing we can have for further analysis.

I'll do some digging, thanks for the information, Eyal. If I don't get to it tomorrow, i'll have some services that have run a few days without elastic_apm jdbc enabled, and that may shed some light if they show similar footprints (ie. it being an 'us' problem which i'm rather hoping for).
We did make a couple of changes at the same time, so I'm hoping I'm just blaming the wrong thing, and your great description may help narrow it down, so I really appreciate that.

If I don't reply tomorrow, I'll be replying on the flip side in the new working week!

Paul.

Sure, thanks! Looking forward to hear back

I can't see any other references :confused:



I don't do this kind of digging a huge amount, so i've included several screenshots (rather than wait on forum tag :))

Hopefully you can see something from these?

One 'interesting' (to me at least) thing is that there's a 'weak-ref-cleaner-2' thread in the second screenie listed as terminated.

1 Like

I think you are onto something here. Could you look into your logs (probably catalina.out) for things like Exception in thread and weak-ref-cleaner-2. Hopefully that reveals why the thread died. My suspicion is that an exception is thrown here: WeakConcurrentMap.java#L254 which terminates the thread because it's not caught.

Certainly looks like the cleaner thread death is the root cause- seems that Statement objects are garbage collected and the SQL strings are leaking: