High cpu usage 100% on elastic search servers

Hello, I've a problem with elastic search that it use100% of CPU; with ES 7.9.3.:

3 nodes.
6 CPU(s)
-Xms8g
-Xmx8g

Kibana run slowly and nodes states is yellow !

Thanks for helping

Regards,
Rali

I doubt anyone will be able to help you based on that very limited information. It would help if you could provide further details:

  • What type of hardware and storage is the cluster hosted on?
  • How much data is held in the cluster?
  • How many indices and shards do you have in the cluster?
  • What is the use-case? Is it index or search heavy?
  • What load is the cluster under?
  • What is the output of the hot threads API?
  • What type of hardware and storage is the cluster hosted on? : Virtual machine - 2 TB storage by node with 500Gb used

  • How much data is held in the cluster? 1,500,000,000 Documents

  • How many indices and shards do you have in the cluster?
    indices : 1100
    Total shards: 3657
    Unassigned shards: 328

  • What is the use-case? Is it index or search heavy? Health of cluster is yellow and search is heavy

  • What is the output of the [hot threads API ]
    (https://www.elastic.co/guide/en/elasticsearch/reference/7.10/cluster-nodes-hot-threads.html)?

::: {node1}{xxxxxxxxxxxxxxxxxxxxxx}{xxxxxxxxxxxxxxxxxxxx}{x.x.x.x}{x.x.x.x:9300}{dilmrt}{ml.machine_memory=16656596992, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}
Hot threads at 2021-01-08T09:15:32.272Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {node2}{xxxxxxxxxxxxxxxxxxxxxxxxxxx}{x.x.x.x}{x.x.x.x:9300}{dilmrt}{ml.machine_memory=16656596992, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}
Hot threads at 2021-01-08T09:15:32.274Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {node3}{xxxxxxxxxxxxxxxxxxxxxxxx}{x.x.x.x}{x.x.x.x:9300}{dilmrt}{ml.machine_memory=16656596992, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}
Hot threads at 2021-01-08T09:15:32.448Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

Thanks for helping
Best Regards,
Rali

For 1.5TB of data you have far too many shards in the cluster. Having lots of small shards is very inefficient. Please read this blog post for some practical guidelines.

This does not answer the question. The output you provided also does not seem to be the full output from the hot threads API. Please provide the full output, which should be quite lengthy.

Please also describe what kind of load the cluster is under.

1 Like

What is the use-case? Is it index or search heavy? Index

GET /_nodes/hot_threads

::: {node1}{xxxxxxxxxxxx}{xxxxxxxx}{x.x.x.x}{x.x.x.x:9300}{dilmrt}{ml.machine_memory=16656596992, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}
Hot threads at 2021-01-11T11:26:41.136Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

73.3% (366.6ms out of 500ms) cpu usage by thread 'elasticsearch[node-1.local][transport_worker][T#1]'
8/10 snapshots sharing following 3 elements
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@15/java.lang.Thread.run(Thread.java:832)

68.0% (340.1ms out of 500ms) cpu usage by thread 'elasticsearch[node1][transport_worker][T#2]'
2/10 snapshots sharing following 83 elements
org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:352)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:269)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4798/0x00000008017d39a0.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:272)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4799/0x00000008017d3bc8.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:313)
org.elasticsearch.xpack.security.authz.RBACEngine$$Lambda$4802/0x00000008017d44a8.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)

 7/10 snapshots sharing following 3 elements
   io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   java.base@15/java.lang.Thread.run(Thread.java:832)

::: {node2}{xxxxxxxxxxx}{xxxxxxxxxxxxxxxxx}{x.x.x.x}{x.x.x.x:9300}{dilmrt}{ml.machine_memory=16656596992, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}
Hot threads at 2021-01-11T11:26:41.141Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

36.3% (181.4ms out of 500ms) cpu usage by thread 'elasticsearch[node2][transport_worker][T#6]'
2/10 snapshots sharing following 83 elements
org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:352)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:269)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4974/0x0000000801825a70.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:272)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4976/0x0000000801825c98.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:313)
org.elasticsearch.xpack.security.authz.RBACEngine$$Lambda$4984/0x0000000801826330.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)

 3/10 snapshots sharing following 9 elements
   java.base@15/sun.nio.ch.EPoll.wait(Native Method)
   java.base@15/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120)
   java.base@15/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   java.base@15/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
   io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
   io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   java.base@15/java.lang.Thread.run(Thread.java:832)
 unique snapshot

27.0% (135.2ms out of 500ms) cpu usage by thread 'elasticsearch[node2][transport_worker][T#2]'
9/10 snapshots sharing following 3 elements
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@15/java.lang.Thread.run(Thread.java:832)

15.1% (75.5ms out of 500ms) cpu usage by thread 'elasticsearch[node2][transport_worker][T#5]'
2/10 snapshots sharing following 83 elements
org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:352)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:269)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4974/0x0000000801825a70.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:272)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4976/0x0000000801825c98.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:313)
org.elasticsearch.xpack.security.authz.RBACEngine$$Lambda$4984/0x0000000801826330.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)

::: {node3}{xxxxxxxxxxxxxxx}{xxxxxxxxxxxxxx}{x.x.x.x}{x.x.x.x:9300}{dilmrt}{ml.machine_memory=16656596992, ml.max_open_jobs=20, xpack.installed=true, transform.node=true}
Hot threads at 2021-01-11T11:26:41.519Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

44.8% (223.9ms out of 500ms) cpu usage by thread 'elasticsearch[node3][transport_worker][T#5]'
2/10 snapshots sharing following 36 elements
java.base@15/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
java.base@15/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
java.base@15/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
java.base@15/sun.nio.ch.IOUtil.write(IOUtil.java:58)
java.base@15/sun.nio.ch.IOUtil.write(IOUtil.java:50)
java.base@15/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
org.elasticsearch.transport.CopyBytesSocketChannel.writeToSocketChannel(CopyBytesSocketChannel.java:136)
org.elasticsearch.transport.CopyBytesSocketChannel.doWrite(CopyBytesSocketChannel.java:104)

 8/10 snapshots sharing following 9 elements
   java.base@15/sun.nio.ch.EPoll.wait(Native Method)
   java.base@15/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120)
   java.base@15/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
   java.base@15/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
   io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
   io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
   io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   java.base@15/java.lang.Thread.run(Thread.java:832)

31.7% (158.2ms out of 500ms) cpu usage by thread 'elasticsearch[node3][transport_worker][T#1]'
10/10 snapshots sharing following 328 elements
org.elasticsearch.xpack.security.authz.RBACEngine.loadAuthorizedIndices(RBACEngine.java:352)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$5(AuthorizationService.java:269)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4871/0x00000008017f4020.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.AuthorizationService.lambda$authorizeAction$8(AuthorizationService.java:272)
org.elasticsearch.xpack.security.authz.AuthorizationService$$Lambda$4872/0x00000008017f4248.getAsync(Unknown Source)
org.elasticsearch.xpack.security.authz.AuthorizationService$CachingAsyncSupplier.getAsync(AuthorizationService.java:676)
org.elasticsearch.xpack.security.authz.RBACEngine.lambda$authorizeIndexAction$4(RBACEngine.java:313)
org.elasticsearch.xpack.security.authz.RBACEngine$$Lambda$4875/0x00000008017f4b28.accept(Unknown Source)
app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63)
org.elasticsearch.xpack.security.authz.RBACEngine.authorizeIndexActionName(RBACEngine.java:337)

23.6% (118ms out of 500ms) cpu usage by thread 'elasticsearch[node3][transport_worker][T#6]'
unique snapshot
app//org.elasticsearch.common.io.stream.ReleasableBytesStreamOutput.ensureCapacity(ReleasableBytesStreamOutput.java:60)
app//org.elasticsearch.common.io.stream.BytesStreamOutput.writeBytes(BytesStreamOutput.java:90)
app//org.elasticsearch.common.io.stream.StreamOutput.writeBytes(StreamOutput.java:184)
app//org.elasticsearch.common.io.stream.StreamOutput.writeString(StreamOutput.java:459)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ThreadContextStruct.writeTo(ThreadContext.java:654)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ThreadContextStruct.access$800(ThreadContext.java:488)
app//org.elasticsearch.common.util.concurrent.ThreadContext.lambda$captureAsWriteable$2(ThreadContext.java:147)
app//org.elasticsearch.common.util.concurrent.ThreadContext$$Lambda$4533/0x00000008016b8518.writeTo(Unknown Source)

Can you develop this question ?