Memory Leak using InternalSearchHit.sourceAsMap

I'm experiencing an odd behavior in a project I'm currently working on. What I'm seeing is that when I reference an InternalSearchHit.sourceAsMap directly I'm experiencing a memory leak. The code is doing some graph traversal and so there are references to other InternalSearchHit.sourceAsMap documents as well.

I'm currently writing groovy code, and so part of what I'm seeing is:

 def docMap = hit.sourceAsMap() 
 //Add data into hash map
 //Add other InternalSearchHit.sourceAsMap as child of map

I end up with a memory leak.

If I copy the map using the HashMap constructor:

 def docMap = new HashMap(hit.sourceAsMap())
 //Add data into hash map
 //Add other InternalSearchHit.sourceAsMap as child of map

Then the memory leak goes away.

I took a look at the source for InternalSearchHit and I didn't see anything glaring. At best what I can guess, is that referencing the sourceAsMap object retains a hold on the SearchHit which in turn retains a hold on something else.

The direct reference to the map should not cause the SearchHit to stick around in the heap. The map contains no back reference to the SearchHit, so when the SearchHit is no longer referenced, it should be garbage collected (while the map itself will still reside in the heap due to your reference to it).

Did you try using a memory profiler to see which objects seem to be leaking?

I did use a memory profiler, and I did not see anything telling. Per the example above, that was the last step in the process, and after the method returned the docMap should have been GC'd. Basically the method could have been:

docMap = hit.sourceAsMap()
//Attach other hit.sourceAsMap() into the docMap hashmap
return;

And I was still getting the memory leak, until I replaced hit.sourceAsMap() with new HashMap(hit.sourceAsMap());

This isn't to mean that there is something I'm doing that is causing the memory leak, it just seems odd given that from my code's perspective hit.sourceAsMap() and new HashMap(hit.sourceAsMap()) should be functionally equivalent. This is why I'm leaning towards something on the InternalSearchHit side.

As a question, I did look at InternalSearchHit and that reference is cached by the InternalSearchHit. Is it possible that has some affect? I realize it shouldn't, but I'm somewhat grasping at straws.

That's what I was looking at too, but the InternalSearchHit can be garbage collected even though you're holding a reference to the map. The InternalSearchHit can be GC'ed as soon as there are no more (strong) references to it, however the map wouldn't be GC'ed until you no longer referenced the map. The map is a standard Java HashMap, so there is no chance it contains any back-references to the InternalSearchHit.

What are the extra objects in the heap (according to the memory profiler) when you compare the heap using the first version vs the second version? Is it possible to just run your one function that gets the source map in isolation, comparing approaches (1) and (2) and seeing what the extra objects in the heap are that aren't being GC'ed?

The majority of the data in the heap are char[] and byte[] primarily from Elasticsearch inbound/outbound. Yeah I don't really know what to say. The code involved is purely functional, so there are no references apart from what is on the stack.

I will note that if I skip the assignment of other hit.sourceAsMap into my docMap, the memory leaks also seem to disappear, though it could just be the rate at which memory grows is substantially decreased when I'm not doing graph traversal.

I'm leaning towards the idea that having the nested structure is the main culprit, but I just can't see why new HashMap would solve that problem vs using the sourceAsMap object. As you said, it should just be a basic map that is exposed internally and should be free for GCing once the stack pops.