JVM memory increase (1 time more -Xmx16g) while ElasticSearch HEAP is stable

Hello,
We have a cluster of 5 ES servers with a "good" hardware config (Intel Xeon 32 threads, 64GB, 4 x SSD RAID0 ...).
We are using last version of ES (2.3.3) and last JVM (JVM: 1.8.0_91) inside a Debian 8.5 with the backports kernel (4.5.4-1~bpo8+1). We have only 2 plugins installed : kopf and Elastic HQ.

Today we have more than 1Billions of documents in this cluster and we are indexing in real time inside the ES cluster.
One time per day we have a batch searching inside the ES cluster and dump the result inside a file.

When the request is running, we can see the JVM memory increasing (+8GB first time then less). At some point the size is "stable": 1 time the size of the Elastic HEAP. So 16GB of Elasticsearch HEAP + 16GB of JVM memory.
I guess it's like the JVM fork() itself during this request and keep this memory until having the same size has -ES HEAP.

I start pmap and can found some memory mapping outside the Elastic Heap using this amount of memory:

Address Kbytes RSS Dirty Mode Mapping
00000003c0000000 16784016 16781008 16781008 rw--- [ anon ] <--- ES HEAP (-Xmx16g-
00007fc65b272000 238884 238884 238884 rw--- [ anon ]
00007fc6c60a9000 240928 240928 240928 rw--- [ anon ]
00007fc7cb34c000 439368 439368 439368 rw--- [ anon ]
00007fccbaad7000 742564 742564 742564 rw--- [ anon ]
00007fcceea00000 493044 493044 493044 rw--- [ anon ]
00007fcdc14d5000 228968 228968 228968 rw--- [ anon ]
00007fcf233a6000 531224 103220 0 r--s- _361c_Lucene50_0.tim
00007fcfa6f02000 220544 220544 220544 rw--- [ anon ]
00007fcfc15a9000 220920 220920 220920 rw--- [ anon ]
00007fd083105000 259508 259508 259508 rw--- [ anon ]
00007fd0ceaf4000 211276 211276 211276 rw--- [ anon ]
00007fd17b0f1000 247504 247504 247504 rw--- [ anon ]
00007fd240803000 595620 117452 0 r--s- _3eft_Lucene50_0.tim
00007fd4ea943000 249368 249368 249368 rw--- [ anon ]
00007fd5b7638000 579996 120492 0 r--s- _38rc_Lucene50_0.tim
00007fd6ac11d000 199888 199888 199888 rw--- [ anon ]
00007fd782c31000 583284 114272 0 r--s- _3cyd_Lucene50_0.tim
00007fd8191cf000 521080 521080 521080 rw--- [ anon ]
00007fd83ae42000 220816 220816 220816 rw--- [ anon ]
00007fd86978e000 266704 266704 266704 rw--- [ anon ]
00007fd8965f6000 219136 219136 219136 rw--- [ anon ]
00007fd8c8dfb000 211032 211032 211032 rw--- [ anon ]
00007fd90dbd6000 941228 941228 941228 rw--- [ anon ]
00007fd967076000 232476 232476 232476 rw--- [ anon ]
00007fd9d7fa6000 213620 213620 213620 rw--- [ anon ]
00007fd9fb13c000 444244 444244 444244 rw--- [ anon ]
00007fda21c6e000 218420 218420 218420 rw--- [ anon ]
00007fda32f6f000 577536 107624 0 r--s- _3c66_Lucene50_0.tim
00007fda718b6000 218948 218948 218948 rw--- [ anon ]
00007fda80cf9000 226936 226936 226936 rw--- [ anon ]
00007fda94767000 487684 487684 487684 rw--- [ anon ]
00007fdad2462000 224888 224888 224888 rw--- [ anon ]
00007fdb08203000 207744 207744 207744 rw--- [ anon ]
00007fdbe6fe4000 223088 223088 223088 rw--- [ anon ]
00007fdc456e0000 430312 430312 430312 rw--- [ anon ]
00007fdc8258f000 253416 253416 253416 rw--- [ anon ]
00007fdcb2c8c000 478532 478532 478532 rw--- [ anon ]
00007fdcd41e6000 690736 690736 690736 rw--- [ anon ]
00007fdd02d26000 234792 234792 234792 rw--- [ anon ]
00007fdd1d611000 233156 233156 233156 rw--- [ anon ]
00007fdd53302000 205264 205264 205264 rw--- [ anon ]
00007fdd6083d000 210696 210696 210696 rw--- [ anon ]
00007fdd8102f000 229252 229252 229252 rw--- [ anon ]
00007fdd911a4000 467748 467748 467748 rw--- [ anon ]
00007fddb0190000 232020 232020 232020 rw--- [ anon ]
00007fddde2c7000 218828 218828 218828 rw--- [ anon ]
00007fde2280f000 193740 193740 193740 rw--- [ anon ]
00007fde5410c000 224136 224136 224136 rw--- [ anon ]
00007fde6ce7e000 217624 217624 217624 rw--- [ anon ]
00007fde95b13000 234420 234420 234420 rw--- [ anon ]
00007fdf364b3000 211376 211376 211376 rw--- [ anon ]
00007fdf64a0a000 204068 204068 204068 rw--- [ anon ]
00007fe3c002a000 327508 295204 295204 rw--- [ anon ]
total kB 280674448 38648108 34243992

How can I find where the problem come from? What are these memory part used for?
I can start gdb and dump this memory zone, but I'm not sure it will be useful.

Thanks for your help,
Damien

I am confused about what you call the JVM memory and the Elasticsearch heap, shouldn't these be the same?

For me JVM memory = memory of the Java process on my linux server (the one you can see with ps command for exemple)
Elasticsearch Heap = Elasticsearch memory inside my Java process (the one you can see with jmap for exemple).

So my JVM is using 32GB (while my JVM setting are -Xms16g -Xmx16g ) and my ES heap (inside the JVM) is at 16GB.

Thanks,
Damien

Elasticsearch uses not only heap but also out-of-heap memory buffers because of Lucene.

If you have mmapfs-enabled indices https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-store.html, Elasticsearch uses mmap() to map index files into main memory. No need to worry.

https://www.elastic.co/guide/en/elasticsearch/guide/master/heap-sizing.html#_give_less_than_half_your_memory_to_lucene

Jörg is right: the fact that Lucene uses mmap will consume some virtual address space, but not actual physical memory. See also http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for more information.

I read entries of

so I think these are the mapped files where the OS decided to load them into real memory. Just a guess, because I don't see the query executed. If this is true[quote="damien_desmarets, post:1, topic:55413"]
One time per day we have a batch searching inside the ES cluster
[/quote]
then it looks like a normal incident, i.e the queries might exercise the whole data set and the OS must manage quite the whole Lucene file set of the indices.

Hey guys,
First thanks for your answers.

I just read the Lucene blog post and I already know that Lucene/ES start to use the file system cache (with MMapDirectory).
That why in my graph memory you can see: Free (in green) + Used memory (in red) + cached memory (the FS cache in blue).

You can see that the Used memory (processes memory) is increasing one time per day (my batch program). It's my JVM memory process increasing until having 2 times the size of -Xmx16g = 32GB. After that, it's not moving at all and stay stable.

For information, my cluster size is 1.68TB without replica.