Elasticsearch 6.8.12 os.cpu always 100 although load varies and stays below number of processors

span · January 4, 2021, 10:08pm

I have a small elastic cluster running on Kubernetes 1.18 in AKS. It has 3 ingest nodes, 3 masters and 4 data nodes. I do not have much data, only a few gigabytes spread over a dozen indices. It has been setup using Helm 3 and elastic chart 6.8.12 [0]

When I run GET /_cat/nodes?v&s=load_1m:desc, you can see all my data nodes report 100% os.cpu and the load vary between 2-4.

ip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.244.15.72           48         100 100    3.52    3.06     2.65 di        -      elasticsearch-test-data-2
10.244.0.237           59         100 100    3.21    3.03     2.87 di        -      elasticsearch-test-data-0
10.244.1.18            48          99 100    3.17    3.22     3.32 di        -      elasticsearch-test-data-1
10.244.17.9            69         100 100    2.72    2.67     3.02 di        -      elasticsearch-test-data-3
10.244.17.8            55          66   3    2.72    2.67     3.02 mi        *      elasticsearch-test-master-2
10.244.17.5            37          84  26    2.72    2.67     3.02 i         -      elasticsearch-test-client-2
10.244.18.42           41          81  28    0.62    0.70     0.63 i         -      elasticsearch-test-client-0
10.244.18.10           66          64   1    0.62    0.70     0.63 mi        -      elasticsearch-test-master-0
10.244.19.12           31          79  26    0.48    0.30     0.35 i         -      elasticsearch-test-client-1
10.244.19.51           18          64   1    0.48    0.30     0.35 mi        -      elasticsearch-test-master-1

I have confirmed the load values are likely correct in numerous ways by:

executing top inside the pods for the data nodes and confirmed they match the load values and that the java process is pretty much the same
executing top on the pods for the data nodes confirming they are pretty much the same
checking application insights reports on how much cpu my data nodes are using

I have configured Kubernetes resources to request 6 cpu and 4Gi RAM. Resource limits are 8 cpu and 4Gi RAM. I run on A8 machines with 8 cpus and I have confirmed the data nodes are each on its own kubernetes node with full access to the cpus.

I confirmed the os view of the cpus by checking /proc/cpuinfo. I have also confirmed that the data nodes each have 6 processors configured. I have monitored my data nodes and seen them go between 2 and 7 cpu usage based on load over this past week.

So, all my information points in the direction that the os.cpu number is incorrect. When checking /_cluster/stats/nodes/<node>, I get the process.cpu value, and it corresponds very well with the load values and the process cpu value from inside the pod. When checking /_nodes/<node>/stats I get both the os.cpu value (100) and the process.cpu value (looks fine).

When looking at hot threads or the thread pool, I see nothing that bothers me. Of course there are some hot threads on some of the nodes but nothing blocking or locking for any significant amount of time.

Is it perhaps a bug in elastic reporting? Am I having issues with the OS view of the VM while running in Kubernetes? Can it be troublesome that my data nodes also are ingest nodes? How can I continue troubleshooting and hopefully at some point sleep well? Am I just reading this cpu value in the wrong way?

[0] https://github.com/elastic/helm-charts/blob/6.8.12/elasticsearch/README.md

DavidTurner · January 5, 2021, 9:29am

The os.cpu number is just 100× whatever the JDK's OperatingSystemMXBean.html#getSystemCpuLoad method reports. What that really means depends on which JDK you're using; in JDK14 at least it works differently in containers from without:

github.com

openjdk/jdk/blob/db6f39302b972a468452e2c2b7200039b2c23556/src/jdk.management/unix/classes/com/sun/management/internal/OperatingSystemImpl.java#L136-L178


if (containerMetrics != null) {
    long quota = containerMetrics.getCpuQuota();
    if (quota > 0) {
        long periodLength = containerMetrics.getCpuPeriod();
        long numPeriods = containerMetrics.getCpuNumPeriods();
        long usageNanos = containerMetrics.getCpuUsage();
        if (periodLength > 0 && numPeriods > 0 && usageNanos > 0) {
            long elapsedNanos = TimeUnit.MICROSECONDS.toNanos(periodLength * numPeriods);
            double systemLoad = (double) usageNanos / elapsedNanos;
            // Ensure the return value is in the range 0.0 -> 1.0
            systemLoad = Math.max(0.0, systemLoad);
            systemLoad = Math.min(1.0, systemLoad);
            return systemLoad;
        }
        return -1;
    } else {
        // If CPU quotas are not active then find the average system load for
        // all online CPUs that are allowed to run this container.
        // If the cpuset is the same as the host's one there is no need to iterate over each CPU

This file has been truncated. show original

In turn these numbers are read fairly directly from /proc/self/...:

github.com

openjdk/jdk/blob/db6f39302b972a468452e2c2b7200039b2c23556/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/CgroupV1Subsystem.java#L62-L83


/**
 * Find the cgroup mount points for subsystems
 * by reading /proc/self/mountinfo
 *
 * Example for docker MemorySubSystem subsystem:
 * 219 214 0:29 /docker/7208cebd00fa5f2e342b1094f7bed87fa25661471a4637118e65f1c995be8a34 /sys/fs/cgroup/MemorySubSystem ro,nosuid,nodev,noexec,relatime - cgroup cgroup rw,MemorySubSystem
 *
 * Example for host:
 * 34 28 0:29 / /sys/fs/cgroup/MemorySubSystem rw,nosuid,nodev,noexec,relatime shared:16 - cgroup cgroup rw,MemorySubSystem
 */
try (Stream<String> lines =
        CgroupUtil.readFilePrivileged(Paths.get("/proc/self/mountinfo"))) {
    lines.filter(line -> line.contains(" - cgroup "))
         .map(line -> line.split(" "))
         .forEach(entry -> createSubSystemController(subsystem, entry));
} catch (UncheckedIOException e) {
    return null;
} catch (IOException e) {

This file has been truncated. show original

Older JDKs were not so aware of containerised environments, so maybe you need a newer JDK to get accurate figures here?

span · January 5, 2021, 10:32am

This is very interesting indeed! Thanks for your input.

We are running version 14 on the nodes:

"version" : "14.0.1",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "14.0.1+7",
"vm_vendor" : "AdoptOpenJDK",

As for the information under /proc/self/... I will have to do some more digging to connect the dots. I have not looked at this information before and need to learn more to understand what I am looking at.

DavidTurner · January 5, 2021, 10:42am

Ok, things have moved slightly between your JDK and the version to which I linked above, the relevant code is here:

github.com

openjdk/jdk/blob/5e48c76e8156c6a2980df90e752f52a721191de4/src/java.base/linux/classes/jdk/internal/platform/cgroupv1/Metrics.java

/*
 * Copyright (c) 2018, 2019, Oracle and/or its affiliates. All rights reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License version 2 only, as
 * published by the Free Software Foundation.  Oracle designates this
 * particular file as subject to the "Classpath" exception as provided
 * by Oracle in the LICENSE file that accompanied this code.
 *
 * This code is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 * version 2 for more details (a copy is included in the LICENSE file that
 * accompanied this code).
 *
 * You should have received a copy of the GNU General Public License version
 * 2 along with this work; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 *

This file has been truncated. show original

(not sure it's very different but it's probably best to look at the right version)

span · January 5, 2021, 11:15am

Wow, thanks. That made it much easier.

I parsed the stats manually and did the calcultation by hand and it always comes up above 2. Which according to the code is then set to be 1 -> 100%.

It seems to me that calculation is off. It does not respect the quota. It checks for quota > 0 but then does not use it. In my case, the quota reported is 800000. With the period being 100000 that leaves 8 cpus. If I add my 8 cpu's into the elapsedNanos my cpu percentage comes back at 0.25-0.3 as expected.

I verified this number with a cpu usage script I've used in the past [0].

So there seems to be a mismatch on what the JDK expects the system to report and what is reported. Any thoughts? Bug in OS or bug in JDK?

Os info:

"name" : "Linux",
"pretty_name" : "CentOS Linux 7 (Core)",
"arch" : "amd64",
"version" : "5.4.0-1034-azure",

[0] https://gist.github.com/pcolby/6558833

DavidTurner · January 5, 2021, 11:28am

As it's outside Elasticsearch, your guess is as good as mine I'm afraid. It could also be something specific to AKS, since I imagine this is using a somewhat customised OS too.

system · February 2, 2021, 11:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Load is not distributed evenly between cpus Elasticsearch	1	280	April 9, 2021
Data node with high os cpu usage but low process cpu usage in aws kubernetes Elasticsearch	7	504	August 12, 2021
Performance issues on shared Kubernetes cluster Elasticsearch	3	327	March 2, 2021
Struggling to get elastic running on k8 Elasticsearch docker	4	791	April 27, 2019
The load of ES cluster CPU is high, but the utilization rate is not high Elasticsearch	7	468	December 14, 2021

Elasticsearch 6.8.12 os.cpu always 100 although load varies and stays below number of processors

Related topics