GET _node/hot_threads result shown warm nodes are doing flush forever

hello,

i am using hot-warm architecture in a single cluster. i have posted before but couldn’t find satisfied answer yet so i am posting this again here.

my indices being allocated from hot to warm node daily bases (using curator around 12pm) but the allocated indices in warm noes are being flushed all day. it's 11:30am and next curator run time is only 30mins left, but warm nodes are still busy to do flush (so far about 23hours).

below is a part of hot_threads results from warm nodes.

please advise why 'flush' job takes really long. what are the warm nodes doing after curator:allocation job is completed?

if there is any post/blog page which makes me understand about 'allocation' from A to B node in detail, please share with me.

fyi, my cluster has 3hot and 3warm nodes, and about 400gb indices are being allocated every day from hot to warm node.

thank you!!

GET _nodes/hot_threads

::: {suy-prd-opr-els-08}{Rn4xDiJMQ5uRod5jxse5HQ}{pRMmLTsHQ8GFnNBMXAmjcQ}{10.10.14.198}{10.10.14.198:9300}{ml.max_open_jobs=10, box_type=warm, ml.enabled=true}
Hot threads at 2019-01-08T02:29:11.744Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

91.7% (458.6ms out of 500ms) cpu usage by thread 'elasticsearch[suy-prd-opr-els-08][flush][T#23]'
3/10 snapshots sharing following 8 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:511)
org.elasticsearch.common.util.concurrent.ThreadContext.stashContext(ThreadContext.java:108)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:634)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 10 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:511)
org.elasticsearch.common.util.concurrent.ThreadContext.lambda$newStoredContext$2(ThreadContext.java:135)
org.elasticsearch.common.util.concurrent.ThreadContext$$Lambda$1922/500934560.close(Unknown Source)
org.elasticsearch.common.util.concurrent.ThreadContext$StoredContext.restore(ThreadContext.java:348)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:636)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
4/10 snapshots sharing following 10 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:509)
org.elasticsearch.common.util.concurrent.ThreadContext.lambda$stashContext$0(ThreadContext.java:109)
org.elasticsearch.common.util.concurrent.ThreadContext$$Lambda$1923/1307480596.close(Unknown Source)
org.elasticsearch.common.util.concurrent.ThreadContext$StoredContext.restore(ThreadContext.java:348)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onAfter(ThreadContext.java:616)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:41)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

89.7% (448.3ms out of 500ms) cpu usage by thread 'elasticsearch[suy-prd-opr-els-08][flush][T#24]'
3/10 snapshots sharing following 8 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:511)
org.elasticsearch.common.util.concurrent.ThreadContext.stashContext(ThreadContext.java:108)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:634)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
3/10 snapshots sharing following 10 elements
org.apache.lucene.util.CloseableThreadLocal.set(CloseableThreadLocal.java:97)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextThreadLocal.set(ThreadContext.java:511)
org.elasticsearch.common.util.concurrent.ThreadContext.lambda$newStoredContext$2(ThreadContext.java:135)
org.elasticsearch.common.util.concurrent.ThreadContext$$Lambda$1922/500934560.close(Unknown Source)
org.elasticsearch.common.util.concurrent.ThreadContext$StoredContext.restore(ThreadContext.java:348)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:636)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.