We were testing out Elasticsearch with almost 1.5 TB index size and there is a different java process running on the same machine which does lots of data processing, which gives batch index jobs to Elasticsearch. The machine has around 76 GB of RAM and the java process is given 6 GB max heap and Elasticsearch is given 24GB. The normal RAM consumption is around 70%. While querying Elasticsearch, the used RAM goes beyond 95%, which I guess is normal considering lucence uses mmaps and the indices are moved to memory. What is troubling is that, when this happens, it is affecting the java program, causing long garbage collection pauses. We did a couple of testing and found swapping had a great impact on the garbage collection performance of the java program we were running. So, we did more testing with a simple program to create large memory maps and tested out the garbage collection performance to find if mmaped files can cause memory pressure. Here are what we were able to find.
- When mmap is created and loaded to memory and no further reads or writes are done to the file, the memory used by mmap was eventually paged out and there doesn't seem to be much performance impact on the garbage collection
- When mmap is created and lots of reads are done on the same, it is creating considerable impact on the garbage collection performance.
Since Elasticsearch does do lots of read operations on the files, I am guessing this is why the performance of the garbage collection suffered.
Which leads to my questions.
- Is it possible that Elasticsearch indices using memory map can cause memory pressure on the machine considering the operating system can page out parts whenever needed? Our tests on simple mmap test programs suggest this can happen, but are we missing something in our tests.
- Is there any way to limit the memory map usage of Elasticsearch so that we can keep the java application from being paged out due to lack of memory?
- Is it recommended to run a real time application in the same machine where Elasticsearch is running?
- Will running the Elasticsearch instance in a different VM, but in the same physical machine help mitigate the issue to an extend?