So we tested both Linux and Windows and we noticed with Windows memory mapped files vs the Linux hybrid approach that windows uses full OS RAM. So if the box has 128GB, 30 allocated to Elasticsearch and 98GB left to the OS, Windows takes it all up for mmapped files. Further more Windows got slow, lots of paging faulting going on.
Does it mean for Windows the whole index must have to fit into memory through mmapped files? So if my index is 256GB on disk then I need 256GB of RAM also? I would assume you need at least the working set for the query but when you are running various different queries it causes Windows to move around data in and out of the mmapped files, causing slowness.
I have found something similar. Although both my -Xmx and Initial/Maximum memory pool values are set to 4GB, when indexing as few as 100K documents, I find the memory just climbs up and up, consuming all available (and then some) RAM. It's a bit of a big deal, what the heck is the point of having a min/max value if it totally gets ignored and can bring a machine to it's knees anyway? I've seen the ES process consuming >16GB of RAM, despite the 4GB setting.
What can I do to stop this? It's a bit problematic to put this on a production machine when it can kill the machine, despite specific settings designed to protect the OS and any other services.
Xmx sets the max java heap space elasticsearch needs to operate, load lucene index metadata etc... basically what it needs to do its day to day job.
But then there's the actual indexes, those are stored on disk. For windows ES is configured to use memory mapped files. I.e It's using OS file caching. This is controlled by the OS not Elasticsearch. So when ever ES does searches on those files it prompts the OS to provide the data through the file cache.
So I would assume you need as much RAM as the working set of what your queries require.
Nah - its fine to have less RAM than the working set so long as you are willing to wait for the IO to page it in.
Its disk caching. Its normal and fine. I don't know how windows reports it but in linux this memory is seen as "virtual" and the heap memory is "resident". People freak out when they see virtual taking up all the ram all time but its fine. The OS handles the cache and returns what it wants to when it wants to.
Are you actually seeing problems like GC thrashing or the RAM not being freed when it should be?
Hmm... That explains why it doesn't show up in any JVM monitors =) Thanks.
We do occasionally see that, yeah. The one machine has 64GB of RAM, with other things running on it. ES has 4GB, but it often goes up to 16GB (server was then at 98% or so), and more than once has affected indexing, it slows right down, and we get errors on some of the bulk inserted documents, which is why I was wondering how to limit it.
But, no real GC thrashing was going on, it was doing some but nothing that I think would have a big impact.
For the most part which pages the os keeps in its cache is outside of the application's control. In POSIX (Linux and Unix and others) you have controls like "I won't need this file again, please don't cache it" and "I think I'm going to need this file again" but it isn't a ton more granular - at least I think that is how it is - I'm not an expert there.
98% of what? RAM in use? I'd be upset if my production servers didn't have 100% in use for disk caching. If it's CPU then that is worth thinking about.
By affecting indexing - do you mean that you can't sustain the indexing rate? Like the time that it takes to index similar data gets longer and longer and you start to see rejections on for the thread pool that handles bulk requests?
A couple of things about that - its normal for a near empty index to be faster to index to but you should expect reasonably high rate even on a full index. Exactly what the rate is depends on lots and lots and lots of stuff. Too much for there to be a number to expect. There is a lot you can do to speed it up - lots of guides for things like refresh rate, bulk size, etc.
If you can post an example (faked out all you like but with the length of the string being about right) of your documents and give us a sense of the hardware we can tell you whether the index rate you are seeing is sensible.
Yeah, 98% taken by applications, so the OS only had 2% left to do file caching. Normally, most of the available RAM is used by the OS for caching. In this case though, the ES process itself was consuming so much of the RAM that none was left for the cache. At that point, we had two failed bulk requests, and the indexing rate dropped to less 5 per second (typically is more like 35-50 per second). Then, all of a sudden the ES process released like 8GB of RAM and things continued normally.
Our data is fairly heavy to index, the rate we're getting in ES is consistent with what we get with Solr, so I'm not overly concerned about it. We do see a slow down as we go, but it's normally able to maintain ~20/s up till the end.