Elasticsearch eat 100% of cpu

I have 3 node Elasticsearch cluster (7 verion) on this HW

  • vCPU: 8
  • RAM 16 ( 8GB HEAP)
  • disk SSD 300gb
  • OS CentOS 7 so totally in my cluster i have 24 vCPU, 48 RAM, and 1,2 TB i have 70 indices (all of indices has 3 primary shards and 1 replica) and each indice is about 25 GB

this cluster is use for storing logs from our services (EFK Stack) and we have Kibana and logtrail to visualize logs in web ui.

and i have CPU bottleneck here's output from top command

PID    USER     PR  NI    VIRT    RES    SHR S  %CPU  %MEM    TIME+     COMMAND
15995 elastic+  20   0   92,2g  10,1g   1,3g S  640,0  64,9   269:27.70  java

So as you can see here, the Elasticsearch process ate all my CPU

output from GET /_nodes/hot_threads

curl 9.9.9.9:9200/_nodes/hot_threads
::: {10.0.87.160}{xPDGrNNmR962Mf51smGWQQ}{T6VTDspVSuujxVgi8KnuTg}{10.0.87.160}{10.0.87.160:9300}{dim}{xpack.installed=true}
   Hot threads at 2020-09-18T07:59:48.726Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   100.2% (501.1ms out of 500ms) cpu usage by thread 'elasticsearch[10.0.87.160][search][T#1]'
     2/10 snapshots sharing following 43 elements
       app//org.apache.lucene.codecs.blocktree.IntersectTermsEnum._next(IntersectTermsEnum.java:510)
       app//org.apache.lucene.codecs.blocktree.IntersectTermsEnum.next(IntersectTermsEnum.java:353)
       

So as you can see here, the Search threads is hot, and eat 100% of CPU, but the Es slowlog file is empty so where is the problem?

Welcome to the community!

So, healthy hardware for your size but the math shows 3.5TB of data - 70 x 25GB x 2 (replica) is more than 1.2TB of disk.

All three nodes are high CPU or just one?

Are you doing crazy searches, like across all indexes and times?

Perhaps post cluster stats? And do you have monitoring on in Kibana, as that may give you some insights.

Thanks! oy yeah math is my bad side!
on all three nodes the CPU consumption is high

here is my stats

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "antprod-efk",
  "cluster_uuid" : "dXB6ZDLsTryY3jOopoDTCQ",
  "timestamp" : 1600431746805,
  "status" : "green",
  "indices" : {
    "count" : 79,
    "shards" : {
      "total" : 434,
      "primaries" : 217,
      "replication" : 1.0,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 6,
          "avg" : 5.493670886075949
        },
        "primaries" : {
          "min" : 1,
          "max" : 3,
          "avg" : 2.7468354430379747
        },
        "replication" : {
          "min" : 1.0,
          "max" : 1.0,
          "avg" : 1.0
        }
      }
    },
    "docs" : {
      "count" : 300367932,
      "deleted" : 101185
    },
    "store" : {
      "size" : "615.6gb",
      "size_in_bytes" : 661090178488
    },
    "fielddata" : {
      "memory_size" : "99.7kb",
      "memory_size_in_bytes" : 102192,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "217.4mb",
      "memory_size_in_bytes" : 227976381,
      "total_count" : 619309100,
      "hit_count" : 52980572,
      "miss_count" : 566328528,
      "cache_size" : 19318,
      "cache_count" : 38493,
      "evictions" : 19175
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 8321,
      "memory" : "636.8mb",
      "memory_in_bytes" : 667830165,
      "terms_memory" : "340mb",
      "terms_memory_in_bytes" : 356525273,
      "stored_fields_memory" : "264.8mb",
      "stored_fields_memory_in_bytes" : 277682208,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "27.9mb",
      "norms_memory_in_bytes" : 29300544,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "4.1mb",
      "doc_values_memory_in_bytes" : 4322140,
      "index_writer_memory" : "85.3mb",
      "index_writer_memory_in_bytes" : 89474368,
      "version_map_memory" : "90.8kb",
      "version_map_memory_in_bytes" : 93012,
      "fixed_bit_set" : "592b",
      "fixed_bit_set_memory_in_bytes" : 592,
      "max_unsafe_auto_id_timestamp" : 1600412416264,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "coordinating_only" : 0,
      "data" : 3,
      "ingest" : 3,
      "master" : 3,
      "ml" : 0,
      "voting_only" : 0
    },
    "versions" : [
      "7.6.2"
    ],
    "os" : {
      "available_processors" : 24,
      "allocated_processors" : 24,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "CentOS Linux 7 (Core)",
          "count" : 3
        }
      ],
      "mem" : {
        "total" : "46.4gb",
        "total_in_bytes" : 49913122816,
        "free" : "938.1mb",
        "free_in_bytes" : 983678976,
        "used" : "45.5gb",
        "used_in_bytes" : 48929443840,
        "free_percent" : 2,
        "used_percent" : 98
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 267
      },
      "open_file_descriptors" : {
        "min" : 2119,
        "max" : 2335,
        "avg" : 2219
      }
    },
    "jvm" : {
      "max_uptime" : "5.7h",
      "max_uptime_in_millis" : 20614865,
      "versions" : [
        {
          "version" : "13.0.2",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "13.0.2+8",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used" : "14.2gb",
        "heap_used_in_bytes" : 15275600104,
        "heap_max" : "23.8gb",
        "heap_max_in_bytes" : 25560612864
      },
      "threads" : 314
    },
    "fs" : {
      "total" : "899.5gb",
      "total_in_bytes" : 965883211776,
      "free" : "275.7gb",
      "free_in_bytes" : 296063852544,
      "available" : "275.7gb",
      "available_in_bytes" : 296063852544
    },
    "plugins" : [ ],
    "network_type
s" : {
      "transport_types" : {
        "netty4" : 3
      },
      "http_types" : {
        "netty4" : 3
      }
    },
    "discovery_types" : {
      "zen" : 3
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "rpm",
        "count" : 3
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 3,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "rename" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "set" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

Thanks - all looks pretty good & normal; maybe others can see indications in there - lots of CPUs, lots of RAM, data not really that large.

Heap only 50% used or so, thus not likely cache pressure, etc. so I'd have to think your queries are heavy duty or doing massive work in some way. Maybe also with huge templates/mappings or some other CPU eating workloads.

IntersectTermsEnum

Not sure what this does, but I bet it's about intersecting terms, building term lists, etc. which presumably is very involved in your queries.

Maybe also your visualizations are using long time-durations, as I've seen high loads doing week, month type graphs on log or other data; with lots of users, that might be issue, really depends.

You can profile queries in Kibana to see if common things you do are performing well, though I'm not sure you can do it for all the built-in Kibana things, UI, etc.

Exactly what version are you running, and what JVM.

Exactly what version are you running, and what JVM.

Note stats show:

"versions" : "7.6.2"

"version" : "13.0.2",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "13.0.2+8",
"vm_vendor" : "AdoptOpenJDK",
"bundled_jdk" : true,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.