After upgrading to 7.3 over the weekend, I now have a node that constantly sits at full CPU utilization. _nodes/hot_threads is empty. The cluster has 25 indices, 250 total shards, and is made of up 3 machines, with each machine having 2 cores and 8gb of memory.
Replacing the high cpu-using node with a new machine did not fix the situation; high cpu usage came back after rebalance. Are there any known steps to fix or this is something new that was introduced in 7.3?
This is surprising, particularly since hot threads is empty. Could you share the full output of the following, using something like https://gist.github.com since it will be quite large.
GET _nodes/hot_threads?threads=99999&ignore_idle_threads=false
Another possibility is that it's busy doing GC, which won't show up in the hot threads. Can you share the last thousand lines or so of the GC log too?
From what I can tell, GC behavior on all 3 nodes is quite similar; what caused me to check is seeing Young Allocation Failures when reading the log so I went to confirm, but happy to post more gc logs to show this.
The misbehaving node has gotten worse and worse (up to a load avg of 20) and this has made our entire deployment unstable so we are being forced to revert back to 7.1.1. I would advise anyone reading this to carefully test 7.3.0 in their environment/traffic pattern or avoid it entirely.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.