Elastic eating 25% CPU

Hi,

I am very new to logging and successfully installed a month back an ELK stack for monitoring with grafana, prometheus and cadvisor (all running in docker).

But for about two weeks now I am seeing a constant elastic 25% CPU usage (the blue area)

2 weeks range picture

few hours picture

I am not quite sure where to start digging. I read a few things about indices, but as the whole thing is only running for a month, I wonder if that is the right approach. I also started receiving a warning today on an hourly basis:

[WARN ][o.e.m.j.JvmGcMonitorService] [RmS4tmQ] [gc][1025] overhead, spent [517ms] collecting in the last [1s]

Can someone please point me in the right direction for noobs (if possible with some step by step introduction)?

In general to find out more about what ES is doing, use the Cluster API (which is what all monitoring tools use). In your particular case, hot_threads should provide a lot of information on what the CPU is used on.
It might very well be that either you have some expensive queries that are being ran regularly or the node is low on memory and JVM keeps on GCing which eats CPU.
For queries you can do profiling.

Thanks. I read about what you suggested, and looked at the dumps, but am not sure what I should be looking for. While reading and searching, I stumbled over heap size and am wondering if it could have something to do with my issue. As I also constantly receive

[2017-11-21T18:53:32,721][WARN ][o.e.m.j.JvmGcMonitorService] [RmS4tmQ] [gc][379668] overhead, spent [536ms] collecting in the last [1s]

By the way I have a machine:

4 core
64GB RAM (but about 20GB is locked in by another application)
2x4TB HDD
elasticsearch:5.5.0

and am runnig in docker. As I copied the settings for docker-compose.yml I have

environment:

- ES_JAVA_OPTS=-Xms512m -Xmx512m

I ran hot_threads a few times

curl "localhost:9200/_nodes/hot_threads?human&pretty"

with following results

1.
::: {RmS4tmQ}{RmS4tmQpQLSowgPw2A3F7A}{hf-kaOtgTsypWrNI6nMnOw}{127.0.0.1}{127.0.0.1:9300}
Hot threads at 2017-11-21T10:18:16.033Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

2.

::: {RmS4tmQ}{RmS4tmQpQLSowgPw2A3F7A}{hf-kaOtgTsypWrNI6nMnOw}{127.0.0.1}{127.0.0.1:9300}
Hot threads at 2017-11-21T10:18:44.241Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

0.0% (56micros out of 500ms) cpu usage by thread 'elasticsearch[RmS4tmQ][[timer]]'
 10/10 snapshots sharing following 2 elements
   java.lang.Thread.sleep(Native Method)
   org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:536)

3.

::: {RmS4tmQ}{RmS4tmQpQLSowgPw2A3F7A}{hf-kaOtgTsypWrNI6nMnOw}{127.0.0.1}{127.0.0.1:9300}
Hot threads at 2017-11-21T10:18:57.520Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

0.0% (121.7micros out of 500ms) cpu usage by thread 'elasticsearch[RmS4tmQ][scheduler][T#1]'
 unique snapshot
   java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
   org.elasticsearch.indices.IndexingMemoryController.availableShards(IndexingMemoryController.java:154)
   org.elasticsearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker.runUnlocked(IndexingMemoryController.java:291)
   org.elasticsearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:277)
   org.elasticsearch.threadpool.ThreadPool$ReschedulingRunnable.doRun(ThreadPool.java:811)
   org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:436)
   java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   java.util.concurrent.FutureTask.run(FutureTask.java:266)
   java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
   java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   java.lang.Thread.run(Thread.java:748)

Do you see anything I should follow up?

I also did the following

curl localhost:9200/_cluster/health?pretty=true

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 556,
  "active_shards" : 556,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 556,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

curl "localhost:9200/_cluster/pending_tasks?human&pretty"

{
  "tasks" : [ ]
}

curl "localhost:9200/_cluster/reroute?human&pretty"

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "No endpoint or operation is available at [reroute]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "No endpoint or operation is available at [reroute]"
  },
  "status" : 400
}

curl "localhost:9200/_cluster/settings?human&pretty"

{
  "persistent" : { },
  "transient" : { }
}

curl "localhost:9200/_nodes/usage?human&pretty"

{
  "_nodes" : {
    "total" : 0,
    "successful" : 0,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : { }
}

curl "localhost:9200/_remote/info?human&pretty"

{ }

curl "localhost:9200/myindex?human&pretty"

{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index",
        "resource.type" : "index_or_alias",
        "resource.id" : "myindex",
        "index_uuid" : "_na_",
        "index" : "myindex"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index",
    "resource.type" : "index_or_alias",
    "resource.id" : "myindex",
    "index_uuid" : "_na_",
    "index" : "myindex"
  },
  "status" : 404
}

And last but not least

java -XX:+PrintFlagsFinal -version | grep HeapSize

uintx ErgoHeapSizeLimit                         = 0                                   {product}
uintx HeapSizePerGCThread                       = 87241520                            {product} [note] 83MB
uintx InitialHeapSize                          := 1054867456                          {product} [note] 1006MB
uintx LargePageHeapSizeThreshold                = 134217728                           {product} [note] 128MB
uintx MaxHeapSize                              := 16871587840                         {product} [note] 16090MB
openjdk version "1.8.0_131" 
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

I would highly appreciate any further help on what I need to follow up, or try.
Thanks

This means the JVM using ES is using 512MB of memory - that is a very low number which explains why the JVM keeps getting GC'ed. If I understand correctly you have 40 GB of memory left - divide that by 2 (to give the OS space) and you have ~20GB for ES.
Remove the memory limitation or increase it (4GB minimum) and you should see a big increase in performance - the more RAM you give ES, the better (if you have more than 32GB it's better to start up another node).

Thank you.

1.) I am just wondering about the right size of the heap. It states in the elastic description to give it 50% of RAM, but not more than 32GB. But that is basically the max. So what is the RIGHT amount. Is there a way to figure it out? I have other resource intensive processes running, whose RAM need acts dynamically. So I would like to set ES Xmx as low as possible, but enough that it does not act up.

2.) Given that I have set

ES_JAVA_OPTS=-Xms512m -Xmx512m

how come I get the following

java -XX:+PrintFlagsFinal -version | grep HeapSize

uintx InitialHeapSize                          := 1054867456                          {product} [note] 1006MB
uintx MaxHeapSize                              := 16871587840                         {product} [note] 16090MB

Does the heapSize resize by itself?

You start two different java processes.
The first one with ES_JAVA_OPTS starts Elasticsearch which lives on.
The second - java -version starts a separate process which simply points out the version. Since you don't specify the Xm? options it uses the defaults which you see above.

As for the what is the right amount - you already mentioned the rule of thumb yourself. Take the free ram - 60-20=40/2 = 20GB.
If you can afford it go for it - if not, use a smaller heapsize - it all depends in the end of how busy your cluster will be.

Setting the ES_JAVA_OPTS to just 4GB did the trick. Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.