My elastic search cluster contains 3 nodes running JVM: 1.8.0_144 ES: 6.1.2
My nodes information is below:
| ip |
heap.percent |
ram.percent |
cpu |
load_1m |
load_5m |
load_15m |
node.role |
master |
name |
| 172.30.0.175 |
55 |
99 |
55 |
2.13 |
2.06 |
2.15 |
mdi |
- |
hdp-dev2 |
| 172.30.0.170 |
25 |
94 |
53 |
1.96 |
2.30 |
2.49 |
mdi |
- |
hdp-dev3 |
| 172.30.0.159 |
63 |
99 |
54 |
2.32 |
2.39 |
2.35 |
mdi |
* |
hdp-dev1 |
I created a index with 12 shards with replication of 0 and refresh_interval -1 and index.translog.durability ASYNC. I loaded 54M documents into elasticsearch cluster via spark job.
The job completed in 90 min. I changed the index property to
number_of_replicas = 1, refresh_interval = 1s and translog.durability to REQUEST
The cluster turned yellow and after 2 hrs changed to green. I see all shards replicated and everything looks good. However the CPU on the hosts has not come down yet even after several hours. I see an average of 50% used on all 3 nodes.
Attached is the hot_threads output. Can someone tell me how to debug the issue? Is this a known problem or I am missing some configuration?
::: {hdp-dev2}{fcVAmtoBQqqNQ78TnMjnyw}{bgpHjDwTTvODDV9i9Gy78Q}{172.30.0.175}{172.30.0.175:9300}
Hot threads at 2018-01-26T16:18:25.087Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
98.6% (492.7ms out of 500ms) cpu usage by thread 'elasticsearch[hdp-dev2][flush][T#27]'
4/10 snapshots sharing following 5 elements
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 5 elements
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:633)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 16 elements
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
java.util.stream.LongPipeline.reduce(LongPipeline.java:438)
java.util.stream.LongPipeline.sum(LongPipeline.java:396)
org.elasticsearch.index.translog.Translog.sizeInBytesByMinGen(Translog.java:431)
org.elasticsearch.index.translog.Translog.uncommittedSizeInBytes(Translog.java:382)
org.elasticsearch.index.translog.Translog.shouldFlush(Translog.java:532)
org.elasticsearch.index.shard.IndexShard.shouldFlush(IndexShard.java:1598)
org.elasticsearch.index.shard.IndexShard.afterWriteOperation(IndexShard.java:2372)
org.elasticsearch.index.shard.IndexShard$4.onAfter(IndexShard.java:2400)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onAfter(ThreadContext.java:612)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 4 elements
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
98.5% (492.6ms out of 500ms) cpu usage by thread 'elasticsearch[hdp-dev2][flush][T#26]'
3/10 snapshots sharing following 4 elements
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 10 elements
org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1332)
org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1314)
org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1025)
org.elasticsearch.index.shard.IndexShard$4.doRun(IndexShard.java:2394)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 8 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:510)
org.elasticsearch.common.util.concurrent.ThreadContext.stashContext(ThreadContext.java:108)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:633)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 7 elements
org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:1024)
org.elasticsearch.index.shard.IndexShard$4.doRun(IndexShard.java:2394)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:637)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)