High CPU load (3-4) and 50% CPU utilization

Hi,

We're running a test bulk indexing with about 10-15K documents per second on 2 node cluster of c4.xlarge instances.
The bulk indexing is the only operations that we're running on the ES. (We use the latest 1.6.0 ES version)
We have 8 shards for the index (4 on each node)

We noticed that the CPU utilization doesn't go above the 50% area (the instance has 4 vCPU's), but the CPU load is very high ~4 which means 'CPU bottleneck' from what I understand... why is that? why doesn't it use all available CPU?
Is this normal?

What settings can we improve/check to lower the CPU load?
Disks doesn't seem to be the bottleneck here, as the IOPS used are not reaching the disk limits.

what hot thread said? https://www.elastic.co/guide/en/elasticsearch/reference/1.6/cluster-nodes-hot-threads.html

you are using vcpu, any cpu cycle stealing? you can see in the command top.

No CPU steal, and idle is ~50%

Below is the hot-threads output:

::: [elasticsearch1.localdomain][CXZ9TSGmQwu7GVJi7oMRWw][elasticsearch1][inet[/10.0.0.174:9300]]{aws_availability_zone=us-east-1c, max_local_storage_nodes=1}
Hot threads at 2015-07-12T15:18:43.899Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

4.3% (21.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch1.localdomain][refresh][T#4]'
 10/10 snapshots sharing following 31 elements
   sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
   sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
   sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
   sun.nio.ch.IOUtil.read(IOUtil.java:197)
   sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
   sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
   org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
   org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:160)
   org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
   org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:280)
   org.apache.lucene.store.Directory.copy(Directory.java:194)
   org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50)
   org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4785)
   org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:511)
   org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:478)
   org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:521)
   org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:632)
   org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:410)
   org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297)
   org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
   org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262)
   org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171)
   org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
   org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
   org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
   org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
   org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:581)
   org.elasticsearch.index.engine.InternalEngine$1.run(InternalEngine.java:400)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)

3.7% (18.5ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch1.localdomain][bulk][T#2]'
 10/10 snapshots sharing following 32 elements
   sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
   sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
   sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
   sun.nio.ch.IOUtil.read(IOUtil.java:197)
   sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
   sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
   org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
   org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:160)
   org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
   org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:280)
   org.apache.lucene.store.Directory.copy(Directory.java:194)
   org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50)
   org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4785)
   org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:511)
   org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:478)
   org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:521)
   org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:364)
   org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
   org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
   org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252)
   org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:356)
   org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:298)
   org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:269)
   org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:483)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:423)
   org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
   org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
   org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)

2.9% (14.5ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch1.localdomain][refresh][T#1]'
 10/10 snapshots sharing following 31 elements
   sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
   sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
   sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
   sun.nio.ch.IOUtil.read(IOUtil.java:197)
   sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:699)
   sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:684)
   org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
   org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:160)
   org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
   org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:280)
   org.apache.lucene.store.Directory.copy(Directory.java:194)
   org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50)
   org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4785)
   org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:511)
   org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:478)
   org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:521)
   org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:632)
   org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:410)
   org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:297)
   org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
   org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:262)
   org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:171)
   org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:118)
   org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
   org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
   org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
   org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:581)
   org.elasticsearch.index.engine.InternalEngine$1.run(InternalEngine.java:400)
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   java.lang.Thread.run(Thread.java:745)

::: [elasticsearch2.localdomain][Xl5RhqNNT1eyskbArZof2w][elasticsearch2][inet[/10.0.0.189:9300]]{aws_availability_zone=us-east-1c, max_local_storage_nodes=1}
Hot threads at 2015-07-12T15:18:43.898Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

54.0% (270ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch2.localdomain][[lld-test-index1][7]: Lucene Merge Thread #32]'
3/10 snapshots sharing following 14 elements
org.apache.lucene.codecs.compressing.LZ4.encodeLiterals(LZ4.java:149)
org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:170)
org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:161)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:166)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:388)
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
7/10 snapshots sharing following 7 elements
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)

27.8% (138.8ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch2.localdomain][[lld-test-index1][3]: Lucene Merge Thread #32]'
5/10 snapshots sharing following 20 elements
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:340)
org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:151)
org.apache.lucene.store.RateLimitedFSDirectory$RateLimiterWrapper.pause(RateLimitedFSDirectory.java:96)
org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:76)
org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimitedIndexOutput.java:70)
org.apache.lucene.codecs.compressing.LZ4.encodeLiterals(LZ4.java:157)
org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:170)
org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:161)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:166)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:388)
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
3/10 snapshots sharing following 19 elements
java.lang.Thread.sleep(Native Method)
java.lang.Thread.sleep(Thread.java:340)
org.apache.lucene.store.RateLimiter$SimpleRateLimiter.pause(RateLimiter.java:151)
org.apache.lucene.store.RateLimitedFSDirectory$RateLimiterWrapper.pause(RateLimitedFSDirectory.java:96)
org.apache.lucene.store.RateLimitedIndexOutput.checkRate(RateLimitedIndexOutput.java:76)
org.apache.lucene.store.RateLimitedIndexOutput.writeByte(RateLimitedIndexOutput.java:63)
org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:175)
org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:161)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:166)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:388)
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
2/10 snapshots sharing following 13 elements
org.apache.lucene.codecs.compressing.LZ4.encodeSequence(LZ4.java:170)
org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:243)
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:161)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:236)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:166)
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:388)
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:332)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:100)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4223)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3811)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)

looks like cpu is busy merging lucene segments. read segment merging background here https://www.elastic.co/guide/en/elasticsearch/guide/current/merge-process.html before you start to do some tuning.

hth

But why doesn't it use more than 50% CPU utilization if the CPU load is ~4?
Any ideas?