Memory usage unaccounted for?

Hello, ES has been great so far (version 1.5 on AWS), however unless I'm missing some stats some of the JVM memory usage seems unaccounted for.

I started a small CLI tool to summarize the results: https://github.com/tj/es, but from the JSON output it's unclear where the rest of the memory goes, as you can see I'm not using much in field cache or field data (using doc_values). The only thing that comes close is index_writer_max_memory": "203.1mb", but that value should not be sustained no? or is that max within some given period?

      Documents: 5,350,248
         Memory: 730 MB free (34%) – 1.4 GB used (65%)
           Swap: 0 B used
       JVM Heap: 1.1 GB committed – 272 MB used
         JVM GC: 23 MB young – 1.5 MB survivor – 247 MB old
     Field Data: 24 MB (0 evictions)
   Filter Cache: 4.5 MB (0 evictions)
    Query Cache: 0 B (0 evictions)
       ID Cache: 0 B
       Segments: 4.7 MB (46) – 377 kB writer

Any insight would be great, I should note that CPU usage is very low ~15%, memory keeps growing, I have 45% allocated to the JVM in this case, and 60% in my larger cluster which has similar problems.

cheers

Not sure where you see a "problem". What is "unaccounted" exactly? Of course memory keeps growing, because you have a running node. Java takes care of managing the heap, and field data and caches live on the heap.

Your heap is 1.1GB, which is fine regarding default setting of -Xmx1g, the heap usage is 272 MB, which is also fine, and process size of Elasticsearch seems to be around 1.4 GB, which is also fine.

If you want to know what Java objects are allocated on the JVM heap in the 272 MB used, just create a heap dump, e.g. by using the JDK tool jstack. Or, if you want to visualize things, open JMX ports of the ES JVM and connect via jvisualvm tool, then you can watch the heap and create dumps by a click and more.

Field data and caches report as very small, but there is the 247MB in this case in GC so I'm wondering what is causing bloat there. If I leave them running without resizing for a few days they'll be around ~95% and ES starts responding with timeouts etc.

Maybe I really do need more memory for the load, but it's unclear (to me) what the reason is for holding whatever is in GC in memory. I'm enabling doc_values on more fields to see if that helps much as well, it seems to but it continues to ramp up to ~90-95%.

As far as creating dumps go I'm probably screwed with AWS hosting it haha.

This is a very basic issue. Default settings are for starters, you have 5 million documents, and you need to increase the capacity.

247MB in old region of the heap is not bloat but a perfect Java organization to keep active objects in position because they are in use by ES. It does not even fill 20% of available space of the heap - and you call that bloat? A JVM can't execute GC before a threshold is even reached.

If you see 95% heap full, all kinds of thresholds are exceeded. You must increase your heap size. 95% simply means resources are exhausted.

The default of 1g heap is very tiny and absolute minimal system. Use at least 4g heap size, and ES can execute average indexing and querying with 5 million docs.

You don't show your queries or your logs, so it's impossible to get some evidence. If AWS hosting doesn't show your logs then you are doomed - you will never find out what fatal events are happening in your ES node.

1 Like

That's what I'm trying to figure out, just saying "you need to increase capacity" sounds very non-scientific to me. Of course the JVM needs to use the heap but most of the relevant memory stats are presented (filter cache, field data, etc), so I figured most major things would be exposed (which doesn't add up to the totals) to provide more insight into the system and way could potentially be tuned to mitigate the issue.

They are small clusters (3 and 8 nodes ATM), but I can't afford to just throw more memory at the situation unfortunately. I was reading up on doc values and they sounded promising in that regard but seem to have no really made any real impact. I should have mentioned that this is a purely analytical workload so 5 million docs isn't that much for a columnar store, the larger cluster has about 250M which is still not much in the grand scheme of things, maybe I expected to squeeze too much out of them.

In my case as well I would be perfectly fine sacrificing query performance slightly in order to keep the costs down.

Anyway thanks for the responses! I appreciate it.