I created a index with 12 shards with replication of 0 and refresh_interval -1 and index.translog.durability ASYNC. I loaded 54M documents into elasticsearch cluster via spark job.
The job completed in 90 min. I changed the index property to
number_of_replicas = 1, refresh_interval = 1s and translog.durability to REQUEST
The cluster turned yellow and after 2 hrs changed to green. I see all shards replicated and everything looks good. However the CPU on the hosts has not come down yet even after several hours. I see an average of 50% used on all 3 nodes.
Attached is the hot_threads output. Can someone tell me how to debug the issue? Is this a known problem or I am missing some configuration?
::: {hdp-dev2}{fcVAmtoBQqqNQ78TnMjnyw}{bgpHjDwTTvODDV9i9Gy78Q}{172.30.0.175}{172.30.0.175:9300}
Hot threads at 2018-01-26T16:18:25.087Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
98.6% (492.7ms out of 500ms) cpu usage by thread 'elasticsearch[hdp-dev2][flush][T#27]'
4/10 snapshots sharing following 5 elements
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 5 elements
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:633)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 16 elements
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.util.stream.LongPipeline.reduce(LongPipeline.java:438)
java.util.stream.LongPipeline.sum(LongPipeline.java:396)
org.elasticsearch.index.translog.Translog.sizeInBytesByMinGen(Translog.java:431)
org.elasticsearch.index.translog.Translog.uncommittedSizeInBytes(Translog.java:382)
org.elasticsearch.index.translog.Translog.shouldFlush(Translog.java:532)
org.elasticsearch.index.shard.IndexShard.shouldFlush(IndexShard.java:1598)
org.elasticsearch.index.shard.IndexShard.afterWriteOperation(IndexShard.java:2372)
org.elasticsearch.index.shard.IndexShard$4.onAfter(IndexShard.java:2400)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onAfter(ThreadContext.java:612)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 4 elements
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
98.5% (492.6ms out of 500ms) cpu usage by thread 'elasticsearch[hdp-dev2][flush][T#26]'
3/10 snapshots sharing following 4 elements
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 10 elements
org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1332)
org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1314)
org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1025)
org.elasticsearch.index.shard.IndexShard$4.doRun(IndexShard.java:2394)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 8 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:510)
org.elasticsearch.common.util.concurrent.ThreadContext.stashContext(ThreadContext.java:108)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:633)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 7 elements
org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1024)
org.elasticsearch.index.shard.IndexShard$4.doRun(IndexShard.java:2394)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.