Killing swap once and for all, file descriptors too, linux and windows

Josh_Harrison · December 7, 2013, 6:18pm

I'd probably mention it in the default elasticsearch.yml file with a line in the description of mlockall like:
If running as a linux service from an RPM build it (may be/is) necessary to set LimitMEMLOCK=infinity in /etc/systemd/system/elasticsearch.service
Mostly because all of the documentation, suggestions and stackoverflow answers just say "turn on mlockall", having this near that may be helpful.

Alternatively, I don't know if that error 0 happens in other circumstances, it probably does, but it could be worth injecting something like the above into the log when that error happens as a possible solution.

Finally, I figured out that the heap usage reported in elasticHQ and other similar tools appears to be the swap usage of the entire host system - not of ES. Since swap is a bad thing for ES, would it be possible to get a property in _node that actually shows what ES itself is using?
Thanks Alex!
-Josh

On Dec 7, 2013, at 7:07 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey Josh,

glad it is working now. Can you have a look at Elasticsearch Platform — Find real-time answers at scale | Elastic and tell me, if we can improve the docs somehow (or also mentioning it differently in the sysconfig file maybe). Any pointer how we can make this more failureproof? Mentioning it more obvious somewhere? (Except that I should have thought about systemd earlier

Any help is much appreciated!

--Alex

On Fri, Dec 6, 2013 at 8:20 PM, Josh Harrison hijakk@gmail.com wrote:
Yep, using the RPM.
I ended up following the settings in here: Limits are not consumed using Systemd (ulimit -n / ulimit -l) by skymeyer · Pull Request #3355 · elastic/elasticsearch · GitHub and now top says I have a swap usage of 0, agreeing with /proc/pid/smaps - though elastichq still reports swap usage, so I'm not sure what to make of that.
So for those of you that run into this in the future, try this:
After applying fix

Removing limits from /etc/security/limits.conf

elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch - memlock unlimited
/etc/sysconfig/elasticsearch:

Maximum number of open files

MAX_OPEN_FILES=65535

Maximum amount of locked memory

MAX_LOCKED_MEMORY=unlimited
/etc/elasticsearch/elasticsearch.yml:

bootstrap.mlockall: true
Actual fix in /etc/systemd/system/elasticsearch.service

[Service]
...
LimitMEMLOCK=infinity
LimitNOFILE=65535
Note: run the following command after altering the above file:

systemctl --system daemon-reload
Result max_open_files and no mlockall error:

/etc/rc.d/init.d/elasticsearch restart
[2013-07-18 11:20:21,476][INFO ][bootstrap ] max_open_files [65511]
[2013-07-18 11:20:21,616][INFO ][node ] [node1] {0.90.2}[28558]: initializing ...

On Friday, December 6, 2013 1:13:34 AM UTC-8, Alexander Reelsen wrote:
Hey,

ok, so mlockall is configured but not set on startup - at least you know why it is swapping.

How are you starting elasticsearch? Do you use the RPM? If so, I guess you set MAX_LOCKED_MEMORY=unlimited in there? If not, can you try?
Another issue might be (depending on how ES gets started), that the memlock setting is configured for the root user, but not for the elasticsearch one (wild guessing here).

--Alex

On Fri, Dec 6, 2013 at 9:24 AM, Joshua Harrison hij...@gmail.com wrote:
At the moment, the only reason for the mixed environment is to provide an easy migration path for when we burn down the windows node and reimage to RHEL. If we can force the Windows node to be as stable as its linux counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been able to find for this particular error.

Found this thread, Redirecting to Google Groups but it looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen a...@spinscale.de wrote:

Hey,

I would not recommend running a mixed environment, as debugging will become pretty much PITA when debugging different performance statistics of operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process is still swapping (can you please verify by checking top or the /proc file system, I have no idea how elastichq is measuring this to be honest, maybe Roy can help here?), mlockall seems not enabled. Can you check the elasticsearch log file on startup, maybe there is an mlockall error written out, which means, that it is not enabled (we recently added, if the mlockall was successful on startup, so you can see it in the nodes info, but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hij...@gmail.com wrote:
So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).
Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes

I still see swap space getting used, just a few megs at first, and I've seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for the whole system - which seems kinda overkill. File descriptor limits on linux seem fine

What am I missing? I feel like there's probably one little thing that will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and rebooting the server, but when ES came back up, it had its same 32GB swap usage show up in elastichq as it did when the page file was around. ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more instability - rivers dropping, response times to a simple query taking 5-10 minutes, etc, at the recommended 30GB. I still get to enjoy those things on at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in use, or how many the process is allowed to open. I just get back the -1 unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is, since it sounds like there are at least a few people out there running production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eaf1820-72e4-4dd9-89ae-01f0bef815d5%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_oDpVxp5bfXKzZTub4f9YqvRacqcvYTVU2idEm9mtGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1CE5CD83-FA46-400B-8C03-91A1A4FEBBAC%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch swapping Elasticsearch	8	856	July 6, 2017
Java, mlockall and high cpu kswapd Elasticsearch	7	2264	July 6, 2017
Elasticsearch node under utilising file descriptors Elasticsearch	10	450	July 6, 2017
ES freezes the server at startup Elasticsearch	11	1441	July 6, 2017
Elasticsearch with bootstrap.mlockall causes server crash Elasticsearch	9	1023	July 6, 2017

Killing swap once and for all, file descriptors too, linux and windows

Maximum number of open files

Maximum amount of locked memory

Related topics