Hello Guys,
Hope you are all doing well!
We are using Fluent-bit + Elastic + Kibana stack for logging our kubernetes containers. However, sometimes the CPU usage goes too high and search from kibana not working.
Setup information:
> Kibana & Elastic version - 7.9.2
> Elastic host - 5 master-5 data running in different namespace on the same kubernetes cluster.
> Fluent-bit(1.7) - to collect the logs
> Storage : standard disks physical volumes attached of 1.5 TB for each node with total of total 7.5TB
> Number of indices - 20.(fluent-bit gathers around 300 to 450GB of daily kubernetes logs from around 20 nodes. Logs are stored in datewise single indice. Last 20 days indices only maintained.)
> Shards - 2 (20 primary & 20 replica for 20 indices and few other system generated)
> Total number of docs - 7196678040 (Around 359833902 per indice)
> Elasticsearch usage: screenshot attached.
> Kibana memory usage - 470 MB / 1 GB
> Single index pattern with 60 fields.
::: {logging-es-masters-2}{zM6G5JfuR8eN5lXwC47oWA}{Xf0Vkm3ITqqvEnxfZFpvPQ}{10.32.30.13}{10.32.30.13:9300}{mr}{xpack.installed=true, transform.node=false}
Hot threads at 2021-09-27T10:07:57.669Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
0.7% (3.2ms out of 500ms) cpu usage by thread 'elasticsearch[logging-es-masters-2][transport_worker][T#1]'
2/10 snapshots sharing following 3 elements
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@15/java.lang.Thread.run(Thread.java:832)
::: {logging-es-data-0}{woah-pXsRNaSszo62ZxzCw}{RI4xPzumRqapnXK08yyOvA}{10.32.25.18}{10.32.25.18:9300}{dirt}{xpack.installed=true, transform.node=true}
Hot threads at 2021-09-27T10:07:57.719Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
93.6% (467.9ms out of 500ms) cpu usage by thread 'elasticsearch[logging-es-data-0][generic][T#2]'
2/10 snapshots sharing following 42 elements
app//org.apache.lucene.search.ConjunctionDISI.doNext(ConjunctionDISI.java:200)
app//org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
app//org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:265)
app//org.elasticsearch.indices.recovery.RecoverySourceHandler$OperationBatchSender.lambda$executeChunkRequest$1(RecoverySourceHandler.java:790)
app//org.elasticsearch.indices.recovery.RecoverySourceHandler$OperationBatchSender$$Lambda$5788/0x000000080193d620.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:113)
app//org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:163)
app//org.elasticsearch.action.ActionListener$6.onResponse(ActionListener.java:282)
app//org.elasticsearch.action.support.RetryableAction$RetryingListener.onResponse(RetryableAction.java:136)
app//org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54)
app//org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1162)
app//org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1162)
app//org.elasticsearch.transport.InboundHandler$1.doRun(InboundHandler.java:213)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:737)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.base@15/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@15/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15/java.lang.Thread.run(Thread.java:832)
5/10 snapshots sharing following 43 elements
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
00.02.05.19 50 10 98 21.23 17.47 13.52 mr - logging-es-masters-0
00.32.25.18 62 78 98 21.23 17.47 13.52 dirt - logging-es-data-0
00.02.05.19 23 10 41 3.04 2.86 3.15 mr - logging-es-masters-3
00.02.05.19 48 53 41 3.04 2.86 3.15 dirt - logging-es-data-2
00.02.05.19 26 87 34 1.04 1.51 1.76 dirt - logging-es-data-3
00.02.05.19 56 10 32 1.04 1.51 1.76 mr - logging-es-masters-2
00.02.05.19 44 11 12 0.57 0.57 0.81 mr - logging-es-masters-1
00.02.05.19 38 54 12 0.57 0.57 0.81 dirt - logging-es-data-1
00.02.05.19 60 10 100 25.23 19.94 13.10 mr * logging-es-masters-4
00.02.05.19 32 62 100 25.23 19.94 13.10 dirt - logging-es-data-4```