Very uncommon things happening with 7.17.0 ELK Cluster

I have been observing that ELK 7.17.0 version is playing with my time & efforts.
As i can observe, after changes happen in heap memory of ELK pods, have been observed that all thread pools are automatically consuming high CPU usage even though there are no transformers, search, etc.. are running. Looking for some in-sights on this.. Please...find the logs. Of-course, considering that we have configured cluster by default settings of thread pools 3 Master and 6 Data nodes.

Do we need change default thread pools configurations...

Sample log below:
88.0% [cpu=2.9%, other=85.1%] (440ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-xxxxxxxxxxxxxxxxxxxxxxx][refresh][T#1]' 10/10 snapshots sharing following 28 elements app//org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:350) app//org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:476) app//org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:656) app//org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:605) app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:293) app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268) app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:258) app//org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) app//org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:173) app//org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:56) app//org.elasticsearch.index.engine.ElasticsearchReaderManager.refreshIfNeeded(ElasticsearchReaderManager.java:28) app//org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) app//org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253) app//org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:370) app//org.elasticsearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:350) app//org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176) app//org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:225) app//org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1891) app//org.elasticsearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1870) app//org.elasticsearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:3910) app//org.elasticsearch.index.IndexService.maybeRefreshEngine(IndexService.java:917) app//org.elasticsearch.index.IndexService.access$200(IndexService.java:102) app//org.elasticsearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1043) app//org.elasticsearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:133) app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:718) java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) java.base@17.0.1/java.lang.Thread.run(Thread.java:833)

All that shows is your indices are having refreshes applied. It doesn't show high CPU or threadpool use.

Thank you for this response. But not sure... transportations are failing very often. I am assessing the cluster with this thread pools as by default with thread pool.

Are you potentially using some type of very slow networked storage?

All nodes are pods with backend persistent storage from datafabric and data mount with 10GB as default.

I have no experience with Datafabric. Would it be possible to get the iowait and/or iostats the pods are experiencing, e.g. iostat -x?

Unfortunately, i can't get iostat -x command output as executable file not found in path.

OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: "iostat": executable file not found in $PATH": unknown
command terminated with exit code 126

Also i deployed using operator based ELK environment.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.