Help please with high CPU utilization on 1 node of cluster :)

Hi all,

I'm having one data node of 6 dedicated nodes have very high CPU utilization. I'm hoping to have some help figuring out what is causing it.

Here is the output from "top".

top - 17:51:19 up 91 days,  3:38,  1 user,  load average: 25.01, 23.68, 17.83
Tasks: 671 total,   1 running, 670 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 40.7%sy,  0.0%ni, 59.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65858520k total, 64972200k used,   886320k free,   233832k buffers
Swap:  8388600k total,    24064k used,  8364536k free, 26568692k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
 32206 elastics  20   0  763g  38g 6.4g S 1304.9 61.8   2187:13 java    

You can see that the load average is crazy high, and CPU utilization is as well. The problem does not fix itself (I've seen it run over 2 days), and all I can do is cycle the JVM.

This is a 16 core machine.
This is the Java version:

root@bdprodes08:[218]:~> java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (rhel-2.5.5.3.el6_6-x86_64 u79-b14)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

I'm running ES 1.6.0.

Here is the (very snipped) output of the hot_threads API for this node:

::: [elasticsearch-bdprodes08][MWIHO5NQSjChyPm196c5ug][bdprodes08][inet[/10.200.116.248:9300]]{master=false}
   Hot threads at 2015-10-12T17:51:45.866Z, interval=500ms, busiestThreads=10, ignoreIdleThreads=true:
   
   100.8% (504.2ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-bdprodes08][bulk][T#4]'
     9/10 snapshots sharing following 14 elements
       org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
       org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
       org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:356)
       org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:298)
       org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:269)
       org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:483)
   
   100.8% (504.2ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-bdprodes08][[derbysoft-ihg-20151012][0]: Lucene Merge Thread #174]'
     9/10 snapshots sharing following 20 elements
       java.io.FileOutputStream.writeBytes(Native Method)
       java.io.FileOutputStream.write(FileOutputStream.java:345)
       org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:390)
       java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
       java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
       java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
       org.apache.lucene.store.OutputStreamIndexOutput.writeByte(OutputStreamIndexOutput.java:45)

   100.8% (504.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-bdprodes08][[derbysoft-ihg-20151012][0]: Lucene Merge Thread #181]'
     2/10 snapshots sharing following 21 elements
       java.io.FileOutputStream.writeBytes(Native Method)
       java.io.FileOutputStream.write(FileOutputStream.java:345)
       org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:390)

   83.7% (418.2ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-bdprodes08][[nuke_metrics][1]: Lucene Merge Thread #66]'
     unique snapshot
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:476)

   16.9% (84.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-bdprodes08][[derbysoft-bestwestern-20151012][0]: Lucene Merge Thread #125]'
     unique snapshot
       org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:476)
     2/10 snapshots sharing following 21 elements

It looks like a combination of merges and bulk inserts, but I'm not sure why this one node in particular is having issues, while the rest are not. Right now, I'm having to cycle this node about every 2 or three days, which on this cluster size is somewhat of a pain. :smile:

Here are my cluster settings for merging:

 "index.merge.scheduler.max_thread_count": "1",
 "index.merge.policy.type": "tiered",
 "index.merge.policy.max_merged_segment": "5gb",
 "index.merge.policy.segments_per_tier": "10",
 "index.merge.policy.max_merge_at_once": "10",
 "index.merge.policy.max_merge_at_once_explicit": "10",
 "index.translog.flush_threshold_size": "1gb",

The data directory is made up of 8 2TB SATA disks in a Raid 0 stripe.

Can anyone see something misconfigured, or suggest something to try to fix this?

Thank you so much for your time and help!
Chris

If this is real load 25 is high for a 16 core machine but its workable.

Do you happen to use a daily or weekly indexing strategy or something? Elasticsearch's allocation mechanisms think of every shard as equal but in the typical time based use case the current day/week/whatever's index get much much more load than the others because its being written to. Whenever I have an index that I know I'm going to hammer I use total_shards_per_node to force the shards apart from eachother. You might find this helpful. But you should be cognizant that it is a "hard" limit on allocation - elasticsearch won't violate the rule even if the cluster would go red. Its safe if you understand how it works.

There is a hot/warm architecture where you have 3 nodes that you use for indexing with buff solid state disks and when you stop writing to the index you use allocation awareness attributes to shift the indexes onto nodes with larger, slower spinning disks. I dunno if that is a solution to your problem, but it is a general solution to a problem like yours.

Thanks Nik.

25 is "average" for the events that I've seen. Where things get nasty is when it gets to 40+, then indexing gets pretty dramatically backed up until I bounce the node. Yuck. Is there anything configuration-wise that looks suspicious to you by chance. Anything else of interest?

Yes, we use a daily indexing strategy. I've got 6 data nodes, with number_of_shares = 6, and number_of_replicas = 1. So each node has one primary and one replica for each of my (currently) 16 indices. Honestly, I am a little hesitant to play with the total_shards_per_node setting right now, just because things are so evenly distributed, I'm a little curious to see what it might be about this particular node that always has the issue. I have yet to see it on any of the other 5.

Yeah....That would definitely be ideal. As of now, I have no SSD drives in these servers, but that could be another step in the architecture evolution. I've read up on it, and would love to implement it at some point.

Thanks again for the input.
Chris

Again. :((

Marvel for the last 24 hours.

Still looking for any suggestions, and still looking myself!

Many thanks.
Chris

Is this node the master?

I am seeing similar issue with just one node experiencing high CPU and constant GC, but it's the master node.

Might be different issue....

Hi Tin,

For me, the node is a dedicated data node, and I'm not having any GC issues. Might be a different issue!

Chris

Are you distributing indexing and query load evenly across the cluster?

Well, AFAIK I am. :wink: I have one primary and one replica shard for each index on each of the 6 dedicated data nodes. I don't think querying is an issue, because there are none going on when these high CPU events happen, and the hot threads API shows indexing and merging events as the busy threads.

I'm thinking I might play with some of those settings on the cluster (see previous thread for my settings) and see if perhaps I have them too high...although it still bothers me that these are only happening on this one server.

I've also looked at potential hardware issues (failed disks, etc) to see if that could be contributing, but there are none.

Still looking. Thanks for all the replies!
Chris

Yes, sound like different issue than mine. I believe you have replica turned on. I have a 9 node cluster that see fairly high CPU, when I turn off replica for current (active) index, the load went down. I have a script that turn on replica for previous day index.

FWIW, you might want to try that.

Tin