We recently upgraded our ELK stack from v 7.4.0 to 7.7.1 and now we noticed that the OS CPU is 100% on our hot data nodes which do the indexing. The OS CPU is almost always at 100%, the process CPU is close to 45-50%.
We checked the hot threads and saw 98% of the times it is Lucene Merging Thread. The refresh interval on our indices is 120s, size is 100gb (3 shards).
Is this a bug in 7.7.1? Or are we doing something wrong? This was not the case in previous versions.
Another piece of information is:
I tried downgrading our cluster with the exact config to 7.4.0, and the CPU looks perfectly fine now.
Which makes me wonder if it is something in 7.7.1 which is maybe reporting incorrect metrics?
Yeah, any GC going on? If not, this seems odd unless there was merge behavior / scheduling changes (can only limit thread count) , but doubt anything big between 7.4. and 7.7 - you could try to tune merging just to see if that's the issue - and also upgrade to 7.8 to see if helps.
Any change to templates, especially for refresh rate?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.