Bug Report : Memory leak when readiness probe Curls to check the cluster health when security is enabled

When I install elasticsearch using "https://github.com/elastic/helm-charts/tree/master/elasticsearch" with security enabled, I could observe that there is memory leak. Memory usage by each pod keeps on increasing and it is OOMKilled when it reaches the resource limit.

I could narrow down the issue and could identify this memory leak is caused by the readiness probe which does a curl request to fetch the cluster health. ("/", "/_cluster/health?") using loopback ip address and https. This causes some exception in the pods and memory leak happens.(with http there is no memory leak) Exception is copied below. When this curl request is disabled in the readiness probe, cluster work fine. There is no memory leak.

"at org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.authenticateWithCache(CachingUsernamePasswordRealm.java:166) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.support.CachingUsernamePasswordRealm.authenticate(CachingUsernamePasswordRealm.java:103) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$consumeToken$15(AuthenticationService.java:364) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.core.common.IteratingActionListener.run(IteratingActionListener.java:102) [x-pack-core-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.consumeToken(AuthenticationService.java:407) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$extractToken$11(AuthenticationService.java:335) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.extractToken(AuthenticationService.java:345) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$checkForApiKey$3(AuthenticationService.java:288) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.ApiKeyService.authenticateWithApiKeyIfPresent(ApiKeyService.java:360) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.checkForApiKey(AuthenticationService.java:269) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:252) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.TokenService.getAndValidateToken(TokenService.java:390) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$2(AuthenticationService.java:248) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$6(AuthenticationService.java:306) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:317) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:244) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:196) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:122) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.xpack.security.rest.SecurityRestFilter.handleRequest(SecurityRestFilter.java:55) [x-pack-security-7.2.0.jar:7.2.0]",
"at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:240) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:337) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.http.AbstractHttpServerTransport.dispatchRequest(AbstractHttpServerTransport.java:320) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.http.AbstractHttpServerTransport.handleIncomingRequest(AbstractHttpServerTransport.java:370) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.http.AbstractHttpServerTransport.incomingRequest(AbstractHttpServerTransport.java:299) [elasticsearch-7.2.0.jar:7.2.0]",
"at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:66) [transport-netty4-client-7.2.0.jar:7.2.0]",
"at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:31) [transport-netty4-client-7.2.0.jar:7.2.0]",
"at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:58) [transport-netty4-client-7.2.0.jar:7.2.0]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) [netty-codec-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.35.Final.jar:4.1.35.Final]",

If the OOM killer is stopping the process then you have a misconfiguration. Your node's heap size must be no more than 50% of the RAM available to the process. If you set the heap size greater than 50% then the OOM killer may run before the JVM garbage collector frees some unused memory, leading to the effect you describe.

If you configure your heap size correctly and still see something that looks like a memory leak then the process should exit with an OutOfMemoryError and write a heap dump to disk. If you can get this to happen then can you share the logs and the heap dump?

I have a proper configuration, heap size is not more than 50 %.

Issue is when you send a curl with https to check cluster health in readiness probe.
when we use http request, there is no memory leak.
I have confirmed this.

And the memory usage increases gradually and when it reaches the limit, it is OOMKilled(which is expected.)

How can I share the logs. It allows to attach only images here.

To be clear, this is very much not expected in a properly-configured system. You should be hitting the JVM's limits (i.e. seeing an OutOfMemoryError) a long time before the OOM killer comes into play.

You can use http://gist.github.com to share more extensive logs. I would like to see all the logs from when the node started until when it is shut down by the OOM killer. I would also like to see the full logs from the OOM killer itself, which is typically a few hundred lines reported by dmesg.

My concern is why does it occur only when I use protocol "https" in curl request
It does not occur when you use "http" curl request. (If this is the observation, how is it related to configuration.)

If I eliminate this(curl request) from the readiness probe, there is no memory leak and the cluster works properly.

For your reference : https://github.com/elastic/helm-charts/blob/master/elasticsearch/templates/statefulset.yaml

Here from line 141 to 175 you can refer the readiness probe in which you have the curl request. This doesnt cause any issue when protocol is http and cause memory leak when protocol is https.
This is my concern. how is this related to configuration.
(If i do not use this readiness probe(https curl request), there is no issue at all)

I have uploaded the logs to

Please check.

We also need the dmesg output.

How much resources are you allocating to the container?

I tried giving upto 10 Gi memory, 4Gi heap size.
Still memory usage by pods gradually increases and touches 10 Gi.

The logs you shared show the nodes with a max heap size of 1GB, so we would expect to avoid the OOM killer if the pod has more than 2GB available. If that isn't the case then there's something really seriously wrong somewhere. We really need to see the logs from the nodes from startup until they are killed and the dmesg output from the same incident. Please don't try lots of different settings to try and see what works. It's important to focus on a particular case.

Yes. We have more than 2G allocated to the pods.
Sure. We are concentrating on the particular case only. Please give me sometime to share dmesg logs.

Please find the pods memory usage behavior in grafana. It shows the memory usage from the time it is deployed until it touches the limit

When we do curl http only graph is almost steady at 1.24 to 1.30 Gi.
When we do curl https graph goes up gradually.

Please dmesg log for a worker node which has a master pod and a data pod

Great, that's helpful. Picking the last invocation of the OOM killer I see this:

[362534.819807] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[362534.823071] [25311]     0 25311      253        1       4        0          -998 pause
[362534.826170] [29843]  1000 29843  1355515   330141     755        0           872 java
[362534.829152] [28031]  1000 28031     2921      349      11        0           872 sh
[362534.832316] [28041]  1000 28041    21895     1344      47        0           872 curl
[362534.835364] Memory cgroup out of memory: Kill process 749 (elasticsearch[s) score 1187 or sacrifice child
[362534.838772] Killed process 29843 (java) total-vm:5422060kB, anon-rss:1298496kB, file-rss:22068kB, shmem-rss:0kB

The numbers in the table are in 4kB pages, so the java process there has 330141 pages resident, which is 1.25GiB, well within the expected 2GiB limit. So this raises the very pertinent question of why the OOM killer is being invoked at all:

[362534.690340] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
[362534.693131]   cache: dentry(734:dff61daf575d1abc2b87773b3bc8a99be315ddbe32e82717d3595de6f1bd975f), object size: 192, buffer size: 192, default order: 0, min order: 0
[362534.699060]   node 0: slabs: 671549, objs: 14102529, free: 0
[362534.702606] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
[362534.705325]   cache: dentry(734:dff61daf575d1abc2b87773b3bc8a99be315ddbe32e82717d3595de6f1bd975f), object size: 192, buffer size: 192, default order: 0, min order: 0
[362534.711052]   node 0: slabs: 671552, objs: 14102592, free: 0

I see that you're running CentOS, which has known bugs that can yield this sort of issue. Are you running an affected version?

Thanks David. That's really a great finding.
Below are the version details of the components we use.

CentOS Linux release 7.6.1810 (Core)
Kernel : 3.10.0-957.12.2.el7.x86_64
(Not sure this is also an affected version)

Docker : 18.09.7
Kubernetes: v1.14.1

Also could you please clarify how the memory limit exceeds only when "https" is used. How is it triggered when we curl to the pod with "https".

The memory allocation pattern is rather different between the handlers for HTTP and HTTPS, because of the extra work needed to support HTTPS, and it looks like the allocation pattern for HTTPS is hitting a bug more frequently.

To be clear, we haven't seen any clearly surprising behaviour of Elasticsearch here. If it's being killed by the OOM killer when it has only 1.25GiB resident then the RAM available to the process is no greater than 1.25GiB, and the heap size was definitely set higher than 50% of that.

One question: when you say you switched to HTTP, do you mean that you disabled security on your whole cluster, or just that you changed the URL that curl uses from https://localhost/ to http://localhost/?

That's a good question. Let me clarify on that.

  1. Without security enabled (no xpack/https/ssl related config in elasticsearch), if I do https curl request, this memory leak happens.

  2. With security enabled, I don't do curl https, there is no memory leak. ( This behavior is quite surprising.)

Considering the fact that https need some extra work and the allocation pattern is different, why it doesn't happen when the entire cluster works on https without any memory issue. (provided i dont do https curl in readiness probe as explained in point 2.)

So I can narrow down the issue to the point that, "memory leak happens if i do a curl https request to in the readiness probe(doesn't matter security is enabled or not)".

(Below are the two curl commands used in readiness probe
curl -XGET -s -k --fail
curl -XGET -s -k --fail

Hope you get a better understanding of the issue considering the above points as well.

cpu: "1000m"
memory: "4Gi"

esJavaOpts: "-Xmx1g -Xms1g"

These were the configuration I used for the test corresponding to the dmesg i shared.

To be quite clear, you haven't yet shown any evidence of a memory leak. We have seen the memory used by the Elasticsearch process increase a little over time, but it's nowhere near its limit. This is completely normal behaviour.

I ran an Elasticsearch 7.2.0 node with the default settings (1GiB heap) and no security and then executed this:

while (curl -XGET -s -k --fail && curl -XGET -s -k --fail > /dev/null; do : ; done

The RSS for the Elasticsearch process increased over time and stabilised at around 1.31GiB.

I ran an Elasticsearch 7.2.0 node with the default settings (1GiB heap) but with security enabled and then executed this:

while (curl -XGET --user elastic:REDACTED -s -k --fail && curl -XGET --user elastic:REDACTED -s -k --fail > /dev/null; do : ; done

Again, the RSS for the process increased over time and stabilised at around 1.32GiB.

I ran the same experiments with 3 nodes running in a cluster, with essentially identical results. This is completely within expectations.

I should also point out that on at least one occasion the OOM killer killed a curl process, which was the largest memory consumer despite only having 1.5MB resident:

[362441.054645] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[362441.058226] [24821]     0 24821      253        1       4        0          -998 pause
[362441.061816] [22798]  1000 22798     2921      351      11        0           872 sh
[362441.065402] [22804]  1000 22804    21925     1410      48        0           872 curl
[362441.069493] Memory cgroup out of memory: Kill process 22804 (curl) score 872 or sacrifice child
[362441.073248] Killed process 22804 (curl) total-vm:87700kB, anon-rss:1568kB, file-rss:4072kB, shmem-rss:0kB

I really don't think this is something that Elasticsearch can fix.