Killing swap once and for all, file descriptors too, linux and windows

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All running
0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on.
ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for the
whole system - which seems kinda overkill. File descriptor limits on linux
seem fine

What am I missing? I feel like there's probably one little thing that will
make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and rebooting
the server, but when ES came back up, it had its same 32GB swap usage show
up in elastichq as it did when the page file was around. ES_HEAP_SIZE is
set to 24GB, 64GB on the server but I was getting even more instability -
rivers dropping, response times to a simple query taking 5-10 minutes, etc,
at the recommended 30GB. I still get to enjoy those things on at 24GB, but
not quite as often.
There is also, apparently, no functional way to check file descriptors in
use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

I would not recommend running a mixed environment, as debugging will become
pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process
is still swapping (can you please verify by checking top or the /proc file
system, I have no idea how elastichq is measuring this to be honest, maybe
Roy can help here?), mlockall seems not enabled. Can you check the
elasticsearch log file on startup, maybe there is an mlockall error written
out, which means, that it is not enabled (we recently added, if the
mlockall was successful on startup, so you can see it in the nodes info,
but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hijakk@gmail.com wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All running
0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on.
ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for the
whole system - which seems kinda overkill. File descriptor limits on linux
seem fine

What am I missing? I feel like there's probably one little thing that will
make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in
use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

At the moment, the only reason for the mixed environment is to provide an easy migration path for when we burn down the windows node and reimage to RHEL. If we can force the Windows node to be as stable as its linux counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been able to find for this particular error.

Found this thread, Redirecting to Google Groups but it looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

I would not recommend running a mixed environment, as debugging will become pretty much PITA when debugging different performance statistics of operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process is still swapping (can you please verify by checking top or the /proc file system, I have no idea how elastichq is measuring this to be honest, maybe Roy can help here?), mlockall seems not enabled. Can you check the elasticsearch log file on startup, maybe there is an mlockall error written out, which means, that it is not enabled (we recently added, if the mlockall was successful on startup, so you can see it in the nodes info, but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hijakk@gmail.com wrote:
So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).
Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes

I still see swap space getting used, just a few megs at first, and I've seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for the whole system - which seems kinda overkill. File descriptor limits on linux seem fine

What am I missing? I feel like there's probably one little thing that will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and rebooting the server, but when ES came back up, it had its same 32GB swap usage show up in elastichq as it did when the page file was around. ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more instability - rivers dropping, response times to a simple query taking 5-10 minutes, etc, at the recommended 30GB. I still get to enjoy those things on at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in use, or how many the process is allowed to open. I just get back the -1 unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is, since it sounds like there are at least a few people out there running production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

ok, so mlockall is configured but not set on startup - at least you know
why it is swapping.

How are you starting elasticsearch? Do you use the RPM? If so, I guess you
set MAX_LOCKED_MEMORY=unlimited in there? If not, can you try?
Another issue might be (depending on how ES gets started), that the memlock
setting is configured for the root user, but not for the elasticsearch one
(wild guessing here).

--Alex

On Fri, Dec 6, 2013 at 9:24 AM, Joshua Harrison hijakk@gmail.com wrote:

At the moment, the only reason for the mixed environment is to provide an
easy migration path for when we burn down the windows node and reimage to
RHEL. If we can force the Windows node to be as stable as its linux
counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown
mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been
able to find for this particular error.

Found this thread,
Redirecting to Google Groups but it
looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

I would not recommend running a mixed environment, as debugging will
become pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process
is still swapping (can you please verify by checking top or the /proc file
system, I have no idea how elastichq is measuring this to be honest, maybe
Roy can help here?), mlockall seems not enabled. Can you check the
elasticsearch log file on startup, maybe there is an mlockall error written
out, which means, that it is not enabled (we recently added, if the
mlockall was successful on startup, so you can see it in the nodes info,
but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hijakk@gmail.com wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All
running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on.
ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for
the whole system - which seems kinda overkill. File descriptor limits on
linux seem fine

What am I missing? I feel like there's probably one little thing that
will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in
use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8Stu1MmcahTwYSPR4Fv-0TzOF9kk9nUp0b8w88HE%2BNGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Wondering if this article could help you (windows)? http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 6 décembre 2013 at 09:24:33, Joshua Harrison (hijakk@gmail.com) a écrit:

At the moment, the only reason for the mixed environment is to provide an easy migration path for when we burn down the windows node and reimage to RHEL. If we can force the Windows node to be as stable as its linux counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been able to find for this particular error.

Found this thread, https://groups.google.com/forum/#!topic/elasticsearch/0CaBak7sdRE but it looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

I would not recommend running a mixed environment, as debugging will become pretty much PITA when debugging different performance statistics of operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process is still swapping (can you please verify by checking top or the /proc file system, I have no idea how elastichq is measuring this to be honest, maybe Roy can help here?), mlockall seems not enabled. Can you check the elasticsearch log file on startup, maybe there is an mlockall error written out, which means, that it is not enabled (we recently added, if the mlockall was successful on startup, so you can see it in the nodes info, but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hijakk@gmail.com wrote:
So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).
Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes

I still see swap space getting used, just a few megs at first, and I've seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for the whole system - which seems kinda overkill. File descriptor limits on linux seem fine

What am I missing? I feel like there's probably one little thing that will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and rebooting the server, but when ES came back up, it had its same 32GB swap usage show up in elastichq as it did when the page file was around. ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more instability - rivers dropping, response times to a simple query taking 5-10 minutes, etc, at the recommended 30GB. I still get to enjoy those things on at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in use, or how many the process is allowed to open. I just get back the -1 unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is, since it sounds like there are at least a few people out there running production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.52a19586.1fbfe8e0.bd3d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Yep, using the RPM.
I ended up following the settings in
here: Limits are not consumed using Systemd (ulimit -n / ulimit -l) by skymeyer · Pull Request #3355 · elastic/elasticsearch · GitHub and now top
says I have a swap usage of 0, agreeing with /proc/pid/smaps - though
elastichq still reports swap usage, so I'm not sure what to make of that.
So for those of you that run into this in the future, try this:
After applying fix

Removing limits from /etc/security/limits.conf

elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch - memlock unlimited

/etc/sysconfig/elasticsearch:

Maximum number of open files

MAX_OPEN_FILES=65535

Maximum amount of locked memory

MAX_LOCKED_MEMORY=unlimited

/etc/elasticsearch/elasticsearch.yml:

bootstrap.mlockall: true

Actual fix in /etc/systemd/system/elasticsearch.service

[Service]
...
LimitMEMLOCK=infinity
LimitNOFILE=65535

Note: run the following command after altering the above file:

systemctl --system daemon-reload

Result max_open_files and no mlockall error:

/etc/rc.d/init.d/elasticsearch restart
[2013-07-18 11:20:21,476][INFO ][bootstrap ] max_open_files [65511]
[2013-07-18 11:20:21,616][INFO ][node ] [node1] {0.90.2}[28558]: initializing ...

On Friday, December 6, 2013 1:13:34 AM UTC-8, Alexander Reelsen wrote:

Hey,

ok, so mlockall is configured but not set on startup - at least you know
why it is swapping.

How are you starting elasticsearch? Do you use the RPM? If so, I guess you
set MAX_LOCKED_MEMORY=unlimited in there? If not, can you try?
Another issue might be (depending on how ES gets started), that the
memlock setting is configured for the root user, but not for the
elasticsearch one (wild guessing here).

--Alex

On Fri, Dec 6, 2013 at 9:24 AM, Joshua Harrison <hij...@gmail.com<javascript:>

wrote:

At the moment, the only reason for the mixed environment is to provide an
easy migration path for when we burn down the windows node and reimage to
RHEL. If we can force the Windows node to be as stable as its linux
counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown
mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been
able to find for this particular error.

Found this thread,
Redirecting to Google Groups but it
looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen <a...@spinscale.de<javascript:>>
wrote:

Hey,

I would not recommend running a mixed environment, as debugging will
become pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process
is still swapping (can you please verify by checking top or the /proc file
system, I have no idea how elastichq is measuring this to be honest, maybe
Roy can help here?), mlockall seems not enabled. Can you check the
elasticsearch log file on startup, maybe there is an mlockall error written
out, which means, that it is not enabled (we recently added, if the
mlockall was successful on startup, so you can see it in the nodes info,
but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison <hij...@gmail.com<javascript:>

wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All
running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be)
on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for
the whole system - which seems kinda overkill. File descriptor limits on
linux seem fine

What am I missing? I feel like there's probably one little thing that
will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors
in use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eaf1820-72e4-4dd9-89ae-01f0bef815d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Josh,

glad it is working now. Can you have a look at

tell me, if we can improve the docs somehow (or also mentioning it
differently in the sysconfig file maybe). Any pointer how we can make this
more failureproof? Mentioning it more obvious somewhere? (Except that I
should have thought about systemd earlier :wink:

Any help is much appreciated!

--Alex

On Fri, Dec 6, 2013 at 8:20 PM, Josh Harrison hijakk@gmail.com wrote:

Yep, using the RPM.
I ended up following the settings in here:
Limits are not consumed using Systemd (ulimit -n / ulimit -l) by skymeyer · Pull Request #3355 · elastic/elasticsearch · GitHub and now top says
I have a swap usage of 0, agreeing with /proc/pid/smaps - though elastichq
still reports swap usage, so I'm not sure what to make of that.
So for those of you that run into this in the future, try this:
After applying fix

Removing limits from /etc/security/limits.conf

elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch - memlock unlimited

/etc/sysconfig/elasticsearch:

Maximum number of open files

MAX_OPEN_FILES=65535

Maximum amount of locked memory

MAX_LOCKED_MEMORY=unlimited

/etc/elasticsearch/elasticsearch.yml:

bootstrap.mlockall: true

Actual fix in /etc/systemd/system/elasticsearch.service

[Service]
...
LimitMEMLOCK=infinity
LimitNOFILE=65535

Note: run the following command after altering the above file:

systemctl --system daemon-reload

Result max_open_files and no mlockall error:

/etc/rc.d/init.d/elasticsearch restart
[2013-07-18 11:20:21,476][INFO ][bootstrap ] max_open_files [65511]
[2013-07-18 11:20:21,616][INFO ][node ] [node1] {0.90.2}[28558]: initializing ...

On Friday, December 6, 2013 1:13:34 AM UTC-8, Alexander Reelsen wrote:

Hey,

ok, so mlockall is configured but not set on startup - at least you know
why it is swapping.

How are you starting elasticsearch? Do you use the RPM? If so, I guess
you set MAX_LOCKED_MEMORY=unlimited in there? If not, can you try?
Another issue might be (depending on how ES gets started), that the
memlock setting is configured for the root user, but not for the
elasticsearch one (wild guessing here).

--Alex

On Fri, Dec 6, 2013 at 9:24 AM, Joshua Harrison hij...@gmail.com wrote:

At the moment, the only reason for the mixed environment is to provide
an easy migration path for when we burn down the windows node and reimage
to RHEL. If we can force the Windows node to be as stable as its linux
counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown
mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been
able to find for this particular error.

Found this thread, Redirecting to Google Groups
elasticsearch/0CaBak7sdRE but it looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen a...@spinscale.de
wrote:

Hey,

I would not recommend running a mixed environment, as debugging will
become pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch
process is still swapping (can you please verify by checking top or the
/proc file system, I have no idea how elastichq is measuring this to be
honest, maybe Roy can help here?), mlockall seems not enabled. Can you
check the elasticsearch log file on startup, maybe there is an mlockall
error written out, which means, that it is not enabled (we recently added,
if the mlockall was successful on startup, so you can see it in the nodes
info, but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hij...@gmail.com wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All
running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be)
on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536
files

Max locked memory unlimited unlimited
bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for
the whole system - which seems kinda overkill. File descriptor limits on
linux seem fine

What am I missing? I feel like there's probably one little thing that
will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors
in use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%
3DtNaChf5kGg%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5eaf1820-72e4-4dd9-89ae-01f0bef815d5%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_oDpVxp5bfXKzZTub4f9YqvRacqcvYTVU2idEm9mtGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'd probably mention it in the default elasticsearch.yml file with a line in the description of mlockall like:
If running as a linux service from an RPM build it (may be/is) necessary to set LimitMEMLOCK=infinity in /etc/systemd/system/elasticsearch.service
Mostly because all of the documentation, suggestions and stackoverflow answers just say "turn on mlockall", having this near that may be helpful.

Alternatively, I don't know if that error 0 happens in other circumstances, it probably does, but it could be worth injecting something like the above into the log when that error happens as a possible solution.

Finally, I figured out that the heap usage reported in elasticHQ and other similar tools appears to be the swap usage of the entire host system - not of ES. Since swap is a bad thing for ES, would it be possible to get a property in _node that actually shows what ES itself is using?
Thanks Alex!
-Josh

On Dec 7, 2013, at 7:07 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey Josh,

glad it is working now. Can you have a look at Elasticsearch Platform — Find real-time answers at scale | Elastic and tell me, if we can improve the docs somehow (or also mentioning it differently in the sysconfig file maybe). Any pointer how we can make this more failureproof? Mentioning it more obvious somewhere? (Except that I should have thought about systemd earlier :wink:

Any help is much appreciated!

--Alex

On Fri, Dec 6, 2013 at 8:20 PM, Josh Harrison hijakk@gmail.com wrote:
Yep, using the RPM.
I ended up following the settings in here: Limits are not consumed using Systemd (ulimit -n / ulimit -l) by skymeyer · Pull Request #3355 · elastic/elasticsearch · GitHub and now top says I have a swap usage of 0, agreeing with /proc/pid/smaps - though elastichq still reports swap usage, so I'm not sure what to make of that.
So for those of you that run into this in the future, try this:
After applying fix

Removing limits from /etc/security/limits.conf

elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch - memlock unlimited
/etc/sysconfig/elasticsearch:

Maximum number of open files

MAX_OPEN_FILES=65535

Maximum amount of locked memory

MAX_LOCKED_MEMORY=unlimited
/etc/elasticsearch/elasticsearch.yml:

bootstrap.mlockall: true
Actual fix in /etc/systemd/system/elasticsearch.service

[Service]
...
LimitMEMLOCK=infinity
LimitNOFILE=65535
Note: run the following command after altering the above file:

systemctl --system daemon-reload
Result max_open_files and no mlockall error:

/etc/rc.d/init.d/elasticsearch restart
[2013-07-18 11:20:21,476][INFO ][bootstrap ] max_open_files [65511]
[2013-07-18 11:20:21,616][INFO ][node ] [node1] {0.90.2}[28558]: initializing ...

On Friday, December 6, 2013 1:13:34 AM UTC-8, Alexander Reelsen wrote:
Hey,

ok, so mlockall is configured but not set on startup - at least you know why it is swapping.

How are you starting elasticsearch? Do you use the RPM? If so, I guess you set MAX_LOCKED_MEMORY=unlimited in there? If not, can you try?
Another issue might be (depending on how ES gets started), that the memlock setting is configured for the root user, but not for the elasticsearch one (wild guessing here).

--Alex

On Fri, Dec 6, 2013 at 9:24 AM, Joshua Harrison hij...@gmail.com wrote:
At the moment, the only reason for the mixed environment is to provide an easy migration path for when we burn down the windows node and reimage to RHEL. If we can force the Windows node to be as stable as its linux counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've been able to find for this particular error.

Found this thread, Redirecting to Google Groups but it looks like they didn't find a solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen a...@spinscale.de wrote:

Hey,

I would not recommend running a mixed environment, as debugging will become pretty much PITA when debugging different performance statistics of operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process is still swapping (can you please verify by checking top or the /proc file system, I have no idea how elastichq is measuring this to be honest, maybe Roy can help here?), mlockall seems not enabled. Can you check the elasticsearch log file on startup, maybe there is an mlockall error written out, which means, that it is not enabled (we recently added, if the mlockall was successful on startup, so you can see it in the nodes info, but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hij...@gmail.com wrote:
So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).
Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes

I still see swap space getting used, just a few megs at first, and I've seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for the whole system - which seems kinda overkill. File descriptor limits on linux seem fine

What am I missing? I feel like there's probably one little thing that will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and rebooting the server, but when ES came back up, it had its same 32GB swap usage show up in elastichq as it did when the page file was around. ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more instability - rivers dropping, response times to a simple query taking 5-10 minutes, etc, at the recommended 30GB. I still get to enjoy those things on at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in use, or how many the process is allowed to open. I just get back the -1 unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is, since it sounds like there are at least a few people out there running production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%3DtNaChf5kGg%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eaf1820-72e4-4dd9-89ae-01f0bef815d5%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_oDpVxp5bfXKzZTub4f9YqvRacqcvYTVU2idEm9mtGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1CE5CD83-FA46-400B-8C03-91A1A4FEBBAC%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have no idea how elastichq is measuring this to be honest, maybe Roy can
help here?

The swap space alert is using the following formula, taken from the
endpoint (_cluster/nodes/stats?all=1)

formula:"stats.os.swap.used_in_bytes / 1024 / 1024",
upper_limit:["1", "1"],
comment:"Any use of swap by the JVM, no matter how small, can greatly
impact the speed of the garbage collector."

It's merely a warning, as the comment states. If my formula is incorrect, or needs tuning, please let me know and I will change it accordingly. Honestly, this warning appears on every instance I deploy on windows. Haven't looked on Linux, so maybe my acceptable threshold is incorrect?

On Friday, December 6, 2013 3:09:07 AM UTC-5, Alexander Reelsen wrote:

Hey,

I would not recommend running a mixed environment, as debugging will
become pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process
is still swapping (can you please verify by checking top or the /proc file
system, I have no idea how elastichq is measuring this to be honest, maybe
Roy can help here?), mlockall seems not enabled. Can you check the
elasticsearch log file on startup, maybe there is an mlockall error written
out, which means, that it is not enabled (we recently added, if the
mlockall was successful on startup, so you can see it in the nodes info,
but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison <hij...@gmail.com<javascript:>

wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All
running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be) on.
ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for
the whole system - which seems kinda overkill. File descriptor limits on
linux seem fine

What am I missing? I feel like there's probably one little thing that
will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors in
use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/36b476c6-ff27-47e8-ae03-f2a3ea6a3bf5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ElasticHQ uses ES node stats, and ES node stats uses sigar, and sigar uses
the file /proc/meminfo.

If you want to find out if ES JVM is using swap, you can use

cat /proc//status | grep VmSwap

where is the process ID of the ES JVM.

On each Linux system, using swap is a decision of the kernel and this must
not necessarily mean a bad thing. To persuade the kernel in its decisions
not to swap any more, the swappiness

sysctl -a | grep swappiness

should be set to another than the default value of 60. See also

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHqmFOJ2skLkKwao_nJz65Nf0MBZ5991okQTChdgS8hpg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Roy,

this seems swap of the OS (judging by a quick peek in the sigar lib) and
thus is not bound to the JVM process of the running elasticsearch instance.
If there are other process on the system, which swap out, that should be
fine in many cases.

--Alex

On Sat, Dec 7, 2013 at 7:45 PM, Roy Russo royrusso@gmail.com wrote:

I have no idea how elastichq is measuring this to be honest, maybe Roy
can help here?

The swap space alert is using the following formula, taken from the
endpoint (_cluster/nodes/stats?all=1)

formula:"stats.os.swap.used_in_bytes / 1024 / 1024",
upper_limit:["1", "1"],
comment:"Any use of swap by the JVM, no matter how small, can greatly
impact the speed of the garbage collector."

It's merely a warning, as the comment states. If my formula is incorrect, or needs tuning, please let me know and I will change it accordingly. Honestly, this warning appears on every instance I deploy on windows. Haven't looked on Linux, so maybe my acceptable threshold is incorrect?

On Friday, December 6, 2013 3:09:07 AM UTC-5, Alexander Reelsen wrote:

Hey,

I would not recommend running a mixed environment, as debugging will
become pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch process
is still swapping (can you please verify by checking top or the /proc file
system, I have no idea how elastichq is measuring this to be honest, maybe
Roy can help here?), mlockall seems not enabled. Can you check the
elasticsearch log file on startup, maybe there is an mlockall error written
out, which means, that it is not enabled (we recently added, if the
mlockall was successful on startup, so you can see it in the nodes info,
but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hij...@gmail.com wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All
running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be)
on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536 files

Max locked memory unlimited unlimited bytes
I still see swap space getting used, just a few megs at first, and I've
seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for
the whole system - which seems kinda overkill. File descriptor limits on
linux seem fine

What am I missing? I feel like there's probably one little thing that
will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors
in use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/36b476c6-ff27-47e8-ae03-f2a3ea6a3bf5%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_7GMAcUf8xNGH7-ccAkPrG3e6Kt%3DY6kKcaCmNbe5gOkg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Josh,

I think your mentions might make sense to dedicate a whole section in the
reference guide to this, in order to make things more clear. As I fell over
this already myself, I added a process information in the nodes info
output, which tells, if the mlockall call was successful (so there is no
need anymore to look into logfiles).

Still, adding more documentation makes sense, I'l try to put it somewhere
into the documentation section over the week.

Thanks for your feedback!

--Alex

On Sat, Dec 7, 2013 at 7:18 PM, Joshua Harrison hijakk@gmail.com wrote:

I'd probably mention it in the default elasticsearch.yml file with a line
in the description of mlockall like:
If running as a linux service from an RPM build it (may be/is) necessary
to set LimitMEMLOCK=infinity in /etc/systemd/system/elasticsearch.service
Mostly because all of the documentation, suggestions and stackoverflow
answers just say "turn on mlockall", having this near that may be helpful.

Alternatively, I don't know if that error 0 happens in other
circumstances, it probably does, but it could be worth injecting something
like the above into the log when that error happens as a possible solution.

Finally, I figured out that the heap usage reported in elasticHQ and other
similar tools appears to be the swap usage of the entire host system - not
of ES. Since swap is a bad thing for ES, would it be possible to get a
property in _node that actually shows what ES itself is using?
Thanks Alex!
-Josh

On Dec 7, 2013, at 7:07 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey Josh,

glad it is working now. Can you have a look at
Elasticsearch Platform — Find real-time answers at scale | Elastic tell me, if we can improve the docs somehow (or also mentioning it
differently in the sysconfig file maybe). Any pointer how we can make this
more failureproof? Mentioning it more obvious somewhere? (Except that I
should have thought about systemd earlier :wink:

Any help is much appreciated!

--Alex

On Fri, Dec 6, 2013 at 8:20 PM, Josh Harrison hijakk@gmail.com wrote:

Yep, using the RPM.
I ended up following the settings in here:
Limits are not consumed using Systemd (ulimit -n / ulimit -l) by skymeyer · Pull Request #3355 · elastic/elasticsearch · GitHub and now top
says I have a swap usage of 0, agreeing with /proc/pid/smaps - though
elastichq still reports swap usage, so I'm not sure what to make of that.
So for those of you that run into this in the future, try this:
After applying fix

Removing limits from /etc/security/limits.conf

elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch - memlock unlimited

/etc/sysconfig/elasticsearch:

Maximum number of open files

MAX_OPEN_FILES=65535

Maximum amount of locked memory

MAX_LOCKED_MEMORY=unlimited

/etc/elasticsearch/elasticsearch.yml:

bootstrap.mlockall: true

Actual fix in /etc/systemd/system/elasticsearch.service

[Service]
...
LimitMEMLOCK=infinity
LimitNOFILE=65535

Note: run the following command after altering the above file:

systemctl --system daemon-reload

Result max_open_files and no mlockall error:

/etc/rc.d/init.d/elasticsearch restart
[2013-07-18 11:20:21,476][INFO ][bootstrap ] max_open_files [65511]
[2013-07-18 11:20:21,616][INFO ][node ] [node1] {0.90.2}[28558]: initializing ...

On Friday, December 6, 2013 1:13:34 AM UTC-8, Alexander Reelsen wrote:

Hey,

ok, so mlockall is configured but not set on startup - at least you know
why it is swapping.

How are you starting elasticsearch? Do you use the RPM? If so, I guess
you set MAX_LOCKED_MEMORY=unlimited in there? If not, can you try?
Another issue might be (depending on how ES gets started), that the
memlock setting is configured for the root user, but not for the
elasticsearch one (wild guessing here).

--Alex

On Fri, Dec 6, 2013 at 9:24 AM, Joshua Harrison hij...@gmail.comwrote:

At the moment, the only reason for the mixed environment is to provide
an easy migration path for when we burn down the windows node and reimage
to RHEL. If we can force the Windows node to be as stable as its linux
counterparts, we may just stick with it, though.

I am indeed getting an mlock error
[2013-12-05 21:25:34,439][WARN ][common.jna ] Unknown
mlockall error 0
ulimit -l unlimited has been set, which is the only suggestion I've
been able to find for this particular error.

Found this thread, Redirecting to Google Groups
elasticsearch/0CaBak7sdRE but it looks like they didn't find a
solution.
Thanks,
Josh

On Dec 6, 2013, at 12:09 AM, Alexander Reelsen a...@spinscale.de
wrote:

Hey,

I would not recommend running a mixed environment, as debugging will
become pretty much PITA when debugging different performance statistics of
operating systems - might be something for the really adventurous of us.

Now back to topic: If you enabled mlockall, but the elasticsearch
process is still swapping (can you please verify by checking top or the
/proc file system, I have no idea how elastichq is measuring this to be
honest, maybe Roy can help here?), mlockall seems not enabled. Can you
check the elasticsearch log file on startup, maybe there is an mlockall
error written out, which means, that it is not enabled (we recently added,
if the mlockall was successful on startup, so you can see it in the nodes
info, but it is not yet part of a 0.90 release).

--Alex

On Fri, Dec 6, 2013 at 8:43 AM, Josh Harrison hij...@gmail.com wrote:

So, for ES, swap is a bad thing, right?
I'm running, currently, a mixed Windows and Linux environment. All
running 0.90.5.
On linux (RHEL):
Locked memory has been set to unlimited. mlockall is (supposed to be)
on. ES_HEAP_SIZE is set to 12gb (24gb of ram on server).

Limits applied to the Linux process:

Max open files 65536 65536
files

Max locked memory unlimited unlimited
bytes
I still see swap space getting used, just a few megs at first, and
I've seen it get as far up as 800mb in elastichq

The only way I can get swap to turn off for ES is to turn off swap for
the whole system - which seems kinda overkill. File descriptor limits on
linux seem fine

What am I missing? I feel like there's probably one little thing that
will make it all click into place and I've just glazed over it.

On Windows
There is no mlockall. I briefly tried disabling the page file and
rebooting the server, but when ES came back up, it had its same 32GB swap
usage show up in elastichq as it did when the page file was around.
ES_HEAP_SIZE is set to 24GB, 64GB on the server but I was getting even more
instability - rivers dropping, response times to a simple query taking 5-10
minutes, etc, at the recommended 30GB. I still get to enjoy those things on
at 24GB, but not quite as often.
There is also, apparently, no functional way to check file descriptors
in use, or how many the process is allowed to open. I just get back the -1
unknown value. Do file descriptor counts not matter on Windows?

Is swap not a bad word on Windows as far as ES is concerned? If it is,
since it sounds like there are at least a few people out there running
production ES windows servers, how do I manage it properly?

Ideally, we'll be moving the Windows server to RHEL, but there may be
enough red tape in the way that it will take a while.

Thanks!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/be6818aa-6593-47ab-93cf-76104da7406a%
40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAGCwEM_TKRqAkHHxft7Q%2BDFKHXj6hGjnGDSmER_%
3DtNaChf5kGg%40mail.gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/A50D1C97-C6C6-43D7-8CEB-2B29DC863B3F%40gmail.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5eaf1820-72e4-4dd9-89ae-01f0bef815d5%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/JWtzbphQEVQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_oDpVxp5bfXKzZTub4f9YqvRacqcvYTVU2idEm9mtGOQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1CE5CD83-FA46-400B-8C03-91A1A4FEBBAC%40gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9sR9B4ZKRqvipXzAj634Vdtv5vVGT%2Bzon1%2BCrzp9ijvA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.