Quick update,
As much for myself or if anybody else comes across this problem in the
future.
We moved both master and query nodes to use 70% of our calculated
‘usable_memory’.
Things seem stable now.
We are still concerned about being able to maximize java heap size on our
query (aka coordinator) nodes. Master nodes, not such a big deal.
We also discovered that our Ops team had vm.swappiness=0 while we also
were running java with mlockall, and which was an unexpected new scenario.
At this time my best guess is that we are just triggering the same old
long standing linux bug with thrashing on memory page compression vs. disk
IO.
Our next step will be to just run java with mlockall and with
vm.swappiness=1, them from there start trying to use memory more
aggressively again.
On Oct 9, 2014, at 12:24 PM, Michael deMan (ES) elasticsearch@deman.com
wrote:
Hi Jörg,
We tune java heap size against what we think is ‘usable’ memory, not
system memory, specifically to reserve space for other processes like the
java app itself, chef, splunk, etc.
The formula we have right now is:
- masters: "java_min_heap_pct_of_usable_memory": 100
- data: "java_min_heap_pct_of_usable_memory": 50
- query:" java_min_heap_pct_of_usable_memory": 100
where: usable_memory_mb = ((host_memory_mb - 600) * 0.9).floor
I have been thinking the next logical step for us is to put our
master/query nodes back at 50% heap size usage, pound them with load tests,
wait and watch. If nothing else, then we are back in alignment with ES
best practices guidelines, and if the problem goes away we have it solved,
and if it stays around we can dig back into it.
Thanks for the help,
On Oct 9, 2014, at 11:01 AM, joergprante@gmail.com wrote:
The thought of "big disk caching" is correct, but you should be aware this
is a simplification of the concrete situation.
Elasticsearch uses much more RAM than the configured value - you must
leave space for internal "direct" buffers, stacks, classes, libraries etc.
and also for the kernel and the OS to live.
So if you configure 2908m for heap plus enable mlockall, and have just 4 G
RAM, while the kernel and OS processes need also space, then you will have
severe RAM congestion.
Rules of thumb:
-
set ES heap size to around 50% of total RAM but not less than 1 GB and
not more than 32 GB (due to JVM garbage collector performance)
-
if the RAM left is less than 2GB and mlockall is enabled, the risk of
RAM contention is high, in this case, decrease ES heap size until 2GB RAM
is available or set ES direct memory allocation limit
-
if there are other processes running, do not use "total RAM" but
"available RAM" to find out the maximum ES heap size, to ensure other
processes can continue to run without getting under memory pressure (it is
recommended to run ES without any other processes)
-
the total process space of ES might increase significantly over time if
there is no configuration limit set for direct memory buffer allocation
Jörg
On Thu, Oct 9, 2014 at 7:37 PM, Michael deMan (ES) <
elasticsearch@deman.com> wrote:
Also,
For our data nodes we follow best practices with 50% of memory for java
heap, while for our master and query nodes we allocate a higher percentage
with the thought that they really do not need big disk caching. Could that
be our problem?
In addition, the systems actually are not swapping - no swap in use, just
the kswapd process runs away at 100% cpu.
We are on:
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
Elasticsearch 1.3.2.
Thanks in an advance for any pointers, hopefully somebody has seen this
before and knows the quick fix.
On Oct 9, 2014, at 10:23 AM, Michael deMan (ES) elasticsearch@deman.com
wrote:
Hi All,
This is a bit off topic, but we only see this on some of our elastic
search hosts, and it is also the only place where we enable mlockall for
java which is our understanding is a strongly recommended best practice.
Basically we from time to time see kswapd run away at 100% on a single
core.
It seems to hit our master nodes more frequently, and they also have the
least amount of memory.
masters are:
CentOS 6.4
4GB RAM
4GB swap
ES_HEAP_SIZE=2908m
Does anybody know much about this and how to prevent it?
We have hunted google groups, but have not really found the magic bullet.
We have considered turning off swap and seeing what happens in the lab
but prefer not to do that unless it is well known as the correct solution.
Thanks,
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com
https://groups.google.com/d/msgid/elasticsearch/57F0E223-ACBE-4819-A37C-4D935E2D71AF%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com
https://groups.google.com/d/msgid/elasticsearch/F3633E59-60F7-49E2-965E-FFA495A60EE7%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEvM2L6GPsBP%3D9yQUT76tkW13nWp1ZW0uiCXo61eyxFtw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com
https://groups.google.com/d/msgid/elasticsearch/D3B332DE-C11D-4A97-B85F-CFAA11E0B6B1%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com
https://groups.google.com/d/msgid/elasticsearch/441FAE57-EF2B-491B-BD27-5201F8D68E4D%40deman.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.