We just started to use Elastic Cloud solution.
When swithced a part of our services to it - we started to have performance degradation.
My deployment have 2 data nodes HOT, and 2 nodes for WARM storage.
Instance #0
Healthy
v7.12.0
4 GB RAM
azure.data.highio.l32sv2
data_hot
data_content
master
coordinating
ingest
Instance #1
Healthy
v7.12.0
4 GB RAM
azure.data.highio.l32sv2
data_hot
data_content
master eligible
coordinating
ingest
azure.data.highio.l32sv2
data_hot
Monitoring reported 100% CPU utilization by node1
I can see from Perfomance graphs - we usually use all CPU credits.
Maybe i have badly planned architecture ?
Or only 1 way to fix it - increase cluster size by switching to bigger DATA HOT nodes ?
Please help me to understand what's going on.
Trying find most CPU intensive tasks.
GET /_nodes/instance-0000000001/hot_threads
::: {instance-0000000001}{NPwmqbLnQt-bS9RI-vky6Q}{a-1hnleJRGyyJvP_FIa5Pg}{10.46.24.43}{10.46.24.43:19576}{himrst}{logical_availability_zone=zone-1, server_name=instance-0000000001.e61c85c0f72e451e85c281a6c4db29c5, availability_zone=westeurope-2, xpack.installed=true, data=hot, instance_configuration=azure.data.highio.l32sv2, transform.node=true, region=unknown-region}
Hot threads at 2021-04-21T15:06:02.888Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
24.8% (123.8ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000001][write][T#1]'
3/10 snapshots sharing following 22 elements
app//org.elasticsearch.index.mapper.DocumentParser.innerParseObject(DocumentParser.java:434)
app//org.elasticsearch.index.mapper.DocumentParser.parseObjectOrNested(DocumentParser.java:405)
app//org.elasticsearch.index.mapper.DocumentParser.internalParseDocument(DocumentParser.java:111)
app//org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:69)
app//org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:51)
app//org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:121)
app//org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:852)
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:829)
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnReplica(IndexShard.java:808)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOpOnReplica(TransportShardBulkAction.java:469)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnReplica(TransportShardBulkAction.java:451)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$dispatchedShardOperationOnReplica$5(TransportShardBulkAction.java:416)
app//org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$6451/0x0000000801ce28d8.get(Unknown Source)
app//org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:329)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:415)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:74)
app//org.elasticsearch.action.support.replication.TransportWriteAction$2.doRun(TransportWriteAction.java:193)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
2/10 snapshots sharing following 14 elements
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnReplica(IndexShard.java:808)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$dispatchedShardOperationOnReplica$5(TransportShardBulkAction.java:416)
app//org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$6451/0x0000000801ce28d8.get(Unknown Source)
app//org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:329)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:415)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:74)
app//org.elasticsearch.action.support.replication.TransportWriteAction$2.doRun(TransportWriteAction.java:193)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
5/10 snapshots sharing following 10 elements
java.base@15.0.1/jdk.internal.misc.Unsafe.park(Native Method)
java.base@15.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
java.base@15.0.1/java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:743)
java.base@15.0.1/java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:684)
java.base@15.0.1/java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1366)
app//org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:154)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1056)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
22.3% (111.2ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000001][write][T#2]'
3/10 snapshots sharing following 17 elements
app//org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:951)
app//org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:872)
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:844)
app//org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnReplica(IndexShard.java:808)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOpOnReplica(TransportShardBulkAction.java:469)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.performOnReplica(TransportShardBulkAction.java:451)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$dispatchedShardOperationOnReplica$5(TransportShardBulkAction.java:416)
app//org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$6451/0x0000000801ce28d8.get(Unknown Source)
app//org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:329)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:415)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:74)
app//org.elasticsearch.action.support.replication.TransportWriteAction$2.doRun(TransportWriteAction.java:193)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
2/10 snapshots sharing following 8 elements
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:415)
app//org.elasticsearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnReplica(TransportShardBulkAction.java:74)
app//org.elasticsearch.action.support.replication.TransportWriteAction$2.doRun(TransportWriteAction.java:193)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
5/10 snapshots sharing following 10 elements
java.base@15.0.1/jdk.internal.misc.Unsafe.park(Native Method)
java.base@15.0.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
java.base@15.0.1/java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:743)
java.base@15.0.1/java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:684)
java.base@15.0.1/java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1366)
app//org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:154)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1056)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116)
java.base@15.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
12.7% (63.6ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000001][transport_worker][T#1]'
2/10 snapshots sharing following 20 elements
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267)
io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314)
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)
7/10 snapshots sharing following 9 elements
java.base@15.0.1/sun.nio.ch.EPoll.wait(Native Method)
java.base@15.0.1/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:120)
java.base@15.0.1/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
java.base@15.0.1/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:803)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:457)
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
java.base@15.0.1/java.lang.Thread.run(Thread.java:832)