Hi there!
I've got strange issue with elasticsearch cpu load.
Our setup is:
2 servers with elasticsearch 0.90.2, 8gb ram and intel xeon e31230 each.
One index per day to hold statistical data, 11 days of data with
approximately 5gb per day (53gb total). Write load happens every 10
minutes, there is like 6 minutes of idle between data chunks usually. 5
shards per index, 1 replica, 220 shards total. Heap size set to 5gb on each
node.
java -version
java version "1.7.0_17"
Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
For the second time elasticsearch loads cpu for no reason.
Here's timeline:
13:55 - restarted node 02
16:20 - rebalancing completed (bigdesk diagram: http://puu.sh/4hcVR.png,
node 01: http://puu.sh/4hcZN.png, node 02: http://puu.sh/4hd0k.png)
On node 01 field cache size is 3.1gb, btw.
Here is CPU load graphs (red - sys, yellow - user, blue - iowait):
node 01: http://puu.sh/4hd4k.png
node 02: http://puu.sh/4hd4R.png
As you can see, after rebalancing restarted node doesn't actually consume
as much cpu as node that wasn't restarted.
Hot threads while rebalancing (01 -> 02) is in progress:
curl -s http://web41:9200/_nodes/hot_threads
::: [statistics01][-DYUOLZMSFCiiFSvdh99hg][inet[/192.168.0.41:9300]]
8.1% (40.2ms out of 500ms) cpu usage by thread
'elasticsearch[statistics01][[recovery_stream]][T#14]'
7/10 snapshots sharing following 13 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:271)
org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:113)
org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45)
org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:34)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:192)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
2/10 snapshots sharing following 7 elements
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:338)
org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:112)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:187)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
unique snapshot
org.elasticsearch.common.compress.lzf.ChunkEncoder.tryCompress(ChunkEncoder.java:236)
org.elasticsearch.common.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:154)
org.elasticsearch.common.compress.lzf.LZFCompressedStreamOutput.compress(LZFCompressedStreamOutput.java:52)
org.elasticsearch.common.compress.CompressedStreamOutput.writeBytes(CompressedStreamOutput.java:95)
org.elasticsearch.common.io.stream.StreamOutput.write(StreamOutput.java:275)
org.elasticsearch.common.bytes.BytesArray.writeTo(BytesArray.java:106)
org.elasticsearch.common.io.stream.StreamOutput.writeBytesReference(StreamOutput.java:107)
org.elasticsearch.common.io.stream.AdapterStreamOutput.writeBytesReference(AdapterStreamOutput.java:106)
org.elasticsearch.indices.recovery.RecoveryFileChunkRequest.writeTo(RecoveryFileChunkRequest.java:114)
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:539)
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
org.elasticsearch.transport.TransportService.submitRequest(TransportService.java:165)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:192)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
6.6% (33ms out of 500ms) cpu usage by thread
'elasticsearch[statistics01][[recovery_stream]][T#15]'
10/10 snapshots sharing following 13 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:271)
org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:113)
org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45)
org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:34)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:192)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
4.3% (21.7ms out of 500ms) cpu usage by thread
'elasticsearch[statistics01][[recovery_stream]][T#16]'
2/10 snapshots sharing following 7 elements
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:338)
org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:112)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:187)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
7/10 snapshots sharing following 13 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:271)
org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:113)
org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:45)
org.elasticsearch.transport.PlainTransportFuture.txGet(PlainTransportFuture.java:34)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:192)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
unique snapshot
org.elasticsearch.common.compress.lzf.ChunkEncoder.tryCompress(ChunkEncoder.java:236)
org.elasticsearch.common.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:154)
org.elasticsearch.common.compress.lzf.LZFCompressedStreamOutput.compress(LZFCompressedStreamOutput.java:52)
org.elasticsearch.common.compress.CompressedStreamOutput.writeBytes(CompressedStreamOutput.java:95)
org.elasticsearch.common.io.stream.StreamOutput.write(StreamOutput.java:275)
org.elasticsearch.common.bytes.BytesArray.writeTo(BytesArray.java:106)
org.elasticsearch.common.io.stream.StreamOutput.writeBytesReference(StreamOutput.java:107)
org.elasticsearch.common.io.stream.AdapterStreamOutput.writeBytesReference(AdapterStreamOutput.java:106)
org.elasticsearch.indices.recovery.RecoveryFileChunkRequest.writeTo(RecoveryFileChunkRequest.java:114)
org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:539)
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:184)
org.elasticsearch.transport.TransportService.submitRequest(TransportService.java:165)
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:192)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
::: [statistics02][Ec02NRoxT_6CIOhTUw7fJw][inet[/192.168.0.42:9300]]
3.0% (15.2ms out of 500ms) cpu usage by thread
'elasticsearch[statistics02][transport_server_worker][T#11]{New I/O worker
#28}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
2.7% (13.7ms out of 500ms) cpu usage by thread
'elasticsearch[statistics02][transport_server_worker][T#10]{New I/O worker
#27}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
0.8% (3.9ms out of 500ms) cpu usage by thread
'elasticsearch[statistics02][generic][T#19]'
4/10 snapshots sharing following 9 elements
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:338)
org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:112)
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:624)
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:560)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
6/10 snapshots sharing following 11 elements
java.io.FileDescriptor.sync(Native Method)
org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:530)
org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:312)
org.apache.lucene.store.RateLimitedFSDirectory.sync(RateLimitedFSDirectory.java:93)
org.elasticsearch.index.store.Store$StoreDirectory.sync(Store.java:515)
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:639)
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:560)
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
Hot threads after rebalancing is complete:
::: [statistics01][-DYUOLZMSFCiiFSvdh99hg][inet[/192.168.0.41:9300]]
1.2% (6ms out of 500ms) cpu usage by thread
'elasticsearch[statistics01][management][T#10]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
0.7% (3.6ms out of 500ms) cpu usage by thread
'elasticsearch[statistics01][generic][T#1967]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
0.4% (1.9ms out of 500ms) cpu usage by thread
'elasticsearch[statistics01][http_server_worker][T#13]{New I/O worker #47}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
::: [statistics02][Ec02NRoxT_6CIOhTUw7fJw][inet[/192.168.0.42:9300]]
0.2% (858.3micros out of 500ms) cpu usage by thread
'elasticsearch[statistics02][management][T#2]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
0.2% (830.7micros out of 500ms) cpu usage by thread
'elasticsearch[statistics02][management][T#4]'
10/10 snapshots sharing following 9 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:702)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.poll(LinkedTransferQueue.java:1117)
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
0.2% (821.5micros out of 500ms) cpu usage by thread
'elasticsearch[statistics02][transport_server_worker][T#12]{New I/O worker
#29}'
10/10 snapshots sharing following 15 elements
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
org.elasticsearch.common.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
It seems that shards that receive no updates should migrate faster, because
actually they are already in destination. Why it took so long? And why cpu
usage increased in the first place?
Let me know if I could provide more info to troubleshoot this issue, I
didn't restart first node yet.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.