Java, mlockall and high cpu kswapd

elasticsearch1 · October 9, 2014, 5:23pm

Hi All,

This is a bit off topic, but we only see this on some of our elastic search hosts, and it is also the only place where we enable mlockall for java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single core.

It seems to hit our master nodes more frequently, and they also have the least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab but prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch1 · October 9, 2014, 5:37pm

Also,

For our data nodes we follow best practices with 50% of memory for java heap, while for our master and query nodes we allocate a higher percentage with the thought that they really do not need big disk caching. Could that be our problem?

In addition, the systems actually are not swapping - no swap in use, just the kswapd process runs away at 100% cpu.

We are on:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Elasticsearch 1.3.2.

Thanks in an advance for any pointers, hopefully somebody has seen this before and knows the quick fix.

Mike

On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) elasticsearch@deman.com wrote:

Hi All,

This is a bit off topic, but we only see this on some of our Elasticsearch hosts, and it is also the only place where we enable mlockall for java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single core.

It seems to hit our master nodes more frequently, and they also have the least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab but prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 9, 2014, 6:01pm

The thought of "big disk caching" is correct, but you should be aware this
is a simplification of the concrete situation.

Elasticsearch uses much more RAM than the configured value - you must leave
space for internal "direct" buffers, stacks, classes, libraries etc. and
also for the kernel and the OS to live.

So if you configure 2908m for heap plus enable mlockall, and have just 4 G
RAM, while the kernel and OS processes need also space, then you will have
severe RAM congestion.

Rules of thumb:

set ES heap size to around 50% of total RAM but not less than 1 GB and
not more than 32 GB (due to JVM garbage collector performance)
if the RAM left is less than 2GB and mlockall is enabled, the risk of
RAM contention is high, in this case, decrease ES heap size until 2GB RAM
is available or set ES direct memory allocation limit
if there are other processes running, do not use "total RAM" but
"available RAM" to find out the maximum ES heap size, to ensure other
processes can continue to run without getting under memory pressure (it is
recommended to run ES without any other processes)
the total process space of ES might increase significantly over time if
there is no configuration limit set for direct memory buffer allocation

Jörg

On Thu, Oct 9, 2014 at 7:37 PM, Michael deMan (ES) elasticsearch@deman.com
wrote:

Also,

For our data nodes we follow best practices with 50% of memory for java
heap, while for our master and query nodes we allocate a higher percentage
with the thought that they really do not need big disk caching. Could that
be our problem?

In addition, the systems actually are not swapping - no swap in use, just
the kswapd process runs away at 100% cpu.

We are on:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Elasticsearch 1.3.2.

Thanks in an advance for any pointers, hopefully somebody has seen this
before and knows the quick fix.

Mike

On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) elasticsearch@deman.com
wrote:

Hi All,

This is a bit off topic, but we only see this on some of our elastic
search hosts, and it is also the only place where we enable mlockall for
java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single
core.

It seems to hit our master nodes more frequently, and they also have the
least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab but
prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com
https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com
https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch1 · October 9, 2014, 7:24pm

Hi Jörg,

We tune java heap size against what we think is 'usable' memory, not system memory, specifically to reserve space for other processes like the java app itself, chef, splunk, etc.

The formula we have right now is:

masters: "java_min_heap_pct_of_usable_memory": 100
data: "java_min_heap_pct_of_usable_memory": 50
query:" java_min_heap_pct_of_usable_memory": 100
where: usable_memory_mb = ((host_memory_mb - 600) * 0.9).floor

I have been thinking the next logical step for us is to put our master/query nodes back at 50% heap size usage, pound them with load tests, wait and watch. If nothing else, then we are back in alignment with ES best practices guidelines, and if the problem goes away we have it solved, and if it stays around we can dig back into it.

Thanks for the help,

Mike

On Oct 9, 2014, at 11:01 AM, joergprante@gmail.com wrote:

The thought of "big disk caching" is correct, but you should be aware this is a simplification of the concrete situation.

Elasticsearch uses much more RAM than the configured value - you must leave space for internal "direct" buffers, stacks, classes, libraries etc. and also for the kernel and the OS to live.

So if you configure 2908m for heap plus enable mlockall, and have just 4 G RAM, while the kernel and OS processes need also space, then you will have severe RAM congestion.

Rules of thumb:

set ES heap size to around 50% of total RAM but not less than 1 GB and not more than 32 GB (due to JVM garbage collector performance)

if the RAM left is less than 2GB and mlockall is enabled, the risk of RAM contention is high, in this case, decrease ES heap size until 2GB RAM is available or set ES direct memory allocation limit

if there are other processes running, do not use "total RAM" but "available RAM" to find out the maximum ES heap size, to ensure other processes can continue to run without getting under memory pressure (it is recommended to run ES without any other processes)

the total process space of ES might increase significantly over time if there is no configuration limit set for direct memory buffer allocation

Jörg

On Thu, Oct 9, 2014 at 7:37 PM, Michael deMan (ES) elasticsearch@deman.com wrote:
Also,

For our data nodes we follow best practices with 50% of memory for java heap, while for our master and query nodes we allocate a higher percentage with the thought that they really do not need big disk caching. Could that be our problem?

In addition, the systems actually are not swapping - no swap in use, just the kswapd process runs away at 100% cpu.

We are on:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Elasticsearch 1.3.2.

Thanks in an advance for any pointers, hopefully somebody has seen this before and knows the quick fix.

Mike

On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) elasticsearch@deman.com wrote:

Hi All,

This is a bit off topic, but we only see this on some of our Elasticsearch hosts, and it is also the only place where we enable mlockall for java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single core.

It seems to hit our master nodes more frequently, and they also have the least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab but prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com.
For more options, visit https://groups.google.com/d/optout.

burtonator · October 9, 2014, 10:22pm

Don't have a bunch of time to look at this but mlockall can be dangerous
and may have an impact which you dont' understand. Specifically, it could
cause the OOM killer to jump in.

We've mitigated these by using noswap kernels but on more moderl machines
we've been doing numactl and binding it to a specific core... then running
one daemon per core.

This isn't ES specific but look at this:

you may be running out of memory on one numa node , and then kswapd spends
all its time trying to figure out a way to solve it.

You can use interleave but this means that memory is going to be used on
other cores which isn't super fast. It can be easier to setup though.

Kevin

On Thursday, October 9, 2014 10:23:59 AM UTC-7, Michael deMan (ES) wrote:

Hi All,

This is a bit off topic, but we only see this on some of our elastic
search hosts, and it is also the only place where we enable mlockall for
java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single
core.

It seems to hit our master nodes more frequently, and they also have the
least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab but
prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12e39b36-a893-42cb-a577-3a41a6a34282%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elasticsearch1 · October 18, 2014, 8:27am

Quick update,
As much for myself or if anybody else comes across this problem in the future.
We moved both master and query nodes to use 70% of our calculated 'usable_memory'.
Things seem stable now.
We are still concerned about being able to maximize java heap size on our query (aka coordinator) nodes. Master nodes, not such a big deal.
We also discovered that our Ops team had vm.swappiness=0 while we also were running java with mlockall, and which was an unexpected new scenario.
At this time my best guess is that we are just triggering the same old long standing linux bug with thrashing on memory page compression vs. disk IO.
Our next step will be to just run java with mlockall and with vm.swappiness=1, them from there start trying to use memory more aggressively again.

On Oct 9, 2014, at 12:24 PM, Michael deMan (ES) elasticsearch@deman.com wrote:

Hi Jörg,

We tune java heap size against what we think is 'usable' memory, not system memory, specifically to reserve space for other processes like the java app itself, chef, splunk, etc.

The formula we have right now is:

masters: "java_min_heap_pct_of_usable_memory": 100

data: "java_min_heap_pct_of_usable_memory": 50

query:" java_min_heap_pct_of_usable_memory": 100
where: usable_memory_mb = ((host_memory_mb - 600) * 0.9).floor

I have been thinking the next logical step for us is to put our master/query nodes back at 50% heap size usage, pound them with load tests, wait and watch. If nothing else, then we are back in alignment with ES best practices guidelines, and if the problem goes away we have it solved, and if it stays around we can dig back into it.

Thanks for the help,

Mike

On Oct 9, 2014, at 11:01 AM, joergprante@gmail.com mailto:joergprante@gmail.com wrote:

The thought of "big disk caching" is correct, but you should be aware this is a simplification of the concrete situation.

Elasticsearch uses much more RAM than the configured value - you must leave space for internal "direct" buffers, stacks, classes, libraries etc. and also for the kernel and the OS to live.

So if you configure 2908m for heap plus enable mlockall, and have just 4 G RAM, while the kernel and OS processes need also space, then you will have severe RAM congestion.

Rules of thumb:

set ES heap size to around 50% of total RAM but not less than 1 GB and not more than 32 GB (due to JVM garbage collector performance)

if the RAM left is less than 2GB and mlockall is enabled, the risk of RAM contention is high, in this case, decrease ES heap size until 2GB RAM is available or set ES direct memory allocation limit

if there are other processes running, do not use "total RAM" but "available RAM" to find out the maximum ES heap size, to ensure other processes can continue to run without getting under memory pressure (it is recommended to run ES without any other processes)

the total process space of ES might increase significantly over time if there is no configuration limit set for direct memory buffer allocation

Jörg

On Thu, Oct 9, 2014 at 7:37 PM, Michael deMan (ES) <elasticsearch@deman.com mailto:elasticsearch@deman.com> wrote:
Also,

For our data nodes we follow best practices with 50% of memory for java heap, while for our master and query nodes we allocate a higher percentage with the thought that they really do not need big disk caching. Could that be our problem?

In addition, the systems actually are not swapping - no swap in use, just the kswapd process runs away at 100% cpu.

We are on:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Elasticsearch 1.3.2.

Thanks in an advance for any pointers, hopefully somebody has seen this before and knows the quick fix.

Mike

On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) <elasticsearch@deman.com mailto:elasticsearch@deman.com> wrote:

Hi All,

This is a bit off topic, but we only see this on some of our Elasticsearch hosts, and it is also the only place where we enable mlockall for java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single core.

It seems to hit our master nodes more frequently, and they also have the least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab but prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com?utm_medium=email&utm_source=footer.

For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com.
For more options, visit https://groups.google.com/d/optout.

jprante · October 18, 2014, 3:15pm

vm.swappiness=0 means preventing the swapping of anonymous pages and also
process pages, but it does not mean disabling swapping, it means delaying
swap until running out of RAM. The kernel kswapd will go high on CPU
because with swappiness=0 there is no difference between any memory pages
to find for swapping, it has simply a very large LRU page list to traverse.
So I hesitate to see where an old long standing linux bug is (would love a
pointer though)

Instead of "vm.swappiness=0" the command "swapoff -a" plus erasing the
swap partitions (having lots of spare RAM available!) should work better
for the purpose of disabling swap completely.

Jörg

On Sat, Oct 18, 2014 at 10:27 AM, Michael deMan (ES) <
elasticsearch@deman.com> wrote:

Quick update,
As much for myself or if anybody else comes across this problem in the
future.
We moved both master and query nodes to use 70% of our calculated
‘usable_memory’.
Things seem stable now.
We are still concerned about being able to maximize java heap size on our
query (aka coordinator) nodes. Master nodes, not such a big deal.
We also discovered that our Ops team had vm.swappiness=0 while we also
were running java with mlockall, and which was an unexpected new scenario.
At this time my best guess is that we are just triggering the same old
long standing linux bug with thrashing on memory page compression vs. disk
IO.
Our next step will be to just run java with mlockall and with
vm.swappiness=1, them from there start trying to use memory more
aggressively again.

On Oct 9, 2014, at 12:24 PM, Michael deMan (ES) elasticsearch@deman.com
wrote:

Hi Jörg,

We tune java heap size against what we think is ‘usable’ memory, not
system memory, specifically to reserve space for other processes like the
java app itself, chef, splunk, etc.

The formula we have right now is:

masters: "java_min_heap_pct_of_usable_memory": 100

data: "java_min_heap_pct_of_usable_memory": 50

query:" java_min_heap_pct_of_usable_memory": 100
where: usable_memory_mb = ((host_memory_mb - 600) * 0.9).floor

I have been thinking the next logical step for us is to put our
master/query nodes back at 50% heap size usage, pound them with load tests,
wait and watch. If nothing else, then we are back in alignment with ES
best practices guidelines, and if the problem goes away we have it solved,
and if it stays around we can dig back into it.

Thanks for the help,

Mike

On Oct 9, 2014, at 11:01 AM, joergprante@gmail.com wrote:

The thought of "big disk caching" is correct, but you should be aware this
is a simplification of the concrete situation.

Elasticsearch uses much more RAM than the configured value - you must
leave space for internal "direct" buffers, stacks, classes, libraries etc.
and also for the kernel and the OS to live.

So if you configure 2908m for heap plus enable mlockall, and have just 4 G
RAM, while the kernel and OS processes need also space, then you will have
severe RAM congestion.

Rules of thumb:

set ES heap size to around 50% of total RAM but not less than 1 GB and
not more than 32 GB (due to JVM garbage collector performance)

if the RAM left is less than 2GB and mlockall is enabled, the risk of
RAM contention is high, in this case, decrease ES heap size until 2GB RAM
is available or set ES direct memory allocation limit

if there are other processes running, do not use "total RAM" but
"available RAM" to find out the maximum ES heap size, to ensure other
processes can continue to run without getting under memory pressure (it is
recommended to run ES without any other processes)

the total process space of ES might increase significantly over time if
there is no configuration limit set for direct memory buffer allocation

Jörg

On Thu, Oct 9, 2014 at 7:37 PM, Michael deMan (ES) <
elasticsearch@deman.com> wrote:

Also,

For our data nodes we follow best practices with 50% of memory for java
heap, while for our master and query nodes we allocate a higher percentage
with the thought that they really do not need big disk caching. Could that
be our problem?

In addition, the systems actually are not swapping - no swap in use, just
the kswapd process runs away at 100% cpu.

We are on:

java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)

Elasticsearch 1.3.2.

Thanks in an advance for any pointers, hopefully somebody has seen this
before and knows the quick fix.

Mike

On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) elasticsearch@deman.com
wrote:

Hi All,

This is a bit off topic, but we only see this on some of our elastic
search hosts, and it is also the only place where we enable mlockall for
java which is our understanding is a strongly recommended best practice.

Basically we from time to time see kswapd run away at 100% on a single
core.

It seems to hit our master nodes more frequently, and they also have the
least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m

Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.

We have considered turning off swap and seeing what happens in the lab
but prefer not to do that unless it is well known as the correct solution.

Thanks,

Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com
https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com
https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com
https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com
https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEan17Ya6zLVP0WSnXgLUh5eJNR5Ykp5wSTunU9XN%2BqHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch swapping Elasticsearch	8	834	July 6, 2017
MIN/MAX memory allocation and mlockall Elasticsearch	4	701	July 6, 2017
Memory not released upon shutdown Elasticsearch	11	2074	July 6, 2017
Memory problems Elasticsearch	27	1328	July 6, 2017
Garbage Collection and Intermittant Search Performance issues Elasticsearch	7	379	July 6, 2017

Java, mlockall and high cpu kswapd

Related topics