OS setting "transparent huge pages"

Hi.
I've come across the transparent_hugepage setting which is enabled by default in newer Linux distributions. It hurts all kinds of databases, Redis, Oracle, TokuDB, and this is no surprise as RedHat says that "THP is not recommended for database workloads". What is your experience with this setting and elasticsearch, and what do you recommend?
You can check it with
cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
I plan to change it to "never".

In my limited data point (Lucene nightly benchmarks at http://people.apache.org/~mikemccand/lucenebench/indexing.html; see annotations AV and AW), leaving THP enabled gives faster indexing throughput than disabling it.

I never did a more thorough test ... and it could be if I had enabled explicit huge pages in the OS and JVM (rather than letting it be "transparent") that performance would have been even higher than THP.

So be careful because my experience was the opposite ...

If you set JVM heap with -Xms and -Xmx, which is managed by ES_HEAP_SIZE, ES allocates heap at start time at once, and does not try to release it back to the OS while running. With mlockall enabled, ES process memory are not exposed to swap. Hence, there is not much communication between Linux kernel and ES JVM process about memory allocation and release. New Linux kernels do not trust JVM processes when they demand for large memory and allocate memory pages lazily until a page is touched. If you want to convince the Linux kernel about the memory demand of the ES process, add -XX:+AlwaysPreTouch to the JVM options.

THP peculiarities are caused by a strange effect when huge memory pages are beginning to swap when the Linux kernel assumes the JVM process does not need all the memory pages and starts to move them around. With the above settings, the chance is rare this can really happen to ES.

So for the majority of cases (note, beside ES, there should be no other large application running on the same host), you can ignore THP being enabled.

Thanks for sharing these benchmarks, it's a great help. I will be very careful, now.

From what I've read, THP effects performance even without swapping.

You can check this in cat /proc/vmstat

If the number in thp_split and thp_collapse_alloc correlates with pgscan_direct_normal you might have a THP-related problem. If thp_split and thp_collapse_alloc is small in comparison to pgscan_direct_normal there is no reason to be concerned.

1 Like

Thanks. I checked, and on the busy machines we have a ratio of pgscan_direct_normal:thp_collapse_alloc of 1:1000. I'll tell you my results when I'm done with my testing.