100% CPU-Usage on data nodes

Hi Guys. We currently run ECK with 3 master- and 4 data nodes.
Our data nodes jump up to 100% cpu with some querys (its an easy querie for one keyword field)
We are using ECK 1.0.1 and Elasticsearch 7.6
The configuration is as follows :

resources data nodes:

    esJavaOpts: "-Xms8000m -Xmx8000m -XX:MaxMetaspaceSize=1G"
        cpu: 3000m
        memory: 16000Mi
        cpu: 3000m
        memory: 16000Mi
    storage: 1000Gi

Index Info :
(One Index for each day but we mostly query 1 day)

Primaries: 4
Docs Count: 155582677
Storage Size: 221.2gb
Replicas: 2

Used Query:

  "query": {
    "match_phrase": {
      "our.searched.field": "value"

The data node does not log any error. But the monitoring clearly shows the cpu rise to 100% for a certain time.

Should we change the shards of our indexes or give more cpu or something else?

I don't know if this could be related to the query performance, probably worth asking in the Elasticsearch section of this forum.

As far as K8s is concerned, there is a known performance issue related to cpu limits. I documented my findings a little while ago.

In short, you could either:

  • use a recent Kernel version with the fix included
  • remove the cpu limits (keep only cpu requests)
  • tweak CFS quotas settings
  • tweak