Refresh thread consumes CPU resource when changing refresh_interval to -1

Hi,

I encountered a problem in InternalIndexShard#EngineRefresher.
The problem is, 1 core consumed 100% CPU resource when changing
refresh_interval to -1.

In ES 1.3, I took a thread dump twice as below:

$ top -n1 -b -H
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4074 elastics 20 0 9.9g 5.8g 316m R 99.6 18.5 53:19.94 java
...
$ jstack -F
...
Thread 4074: (state = IN_JAVA)

  • org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226
    (Compiled frame; information may be imprecise)

org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
@bci=4, line=128 (Compiled frame)

  • org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object)
    @bci=5, line=58 (Compiled frame)
  • org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100
    (Compiled frame)
  • org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1,
    line=142 (Compiled frame)
  • org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded()
    @bci=11, line=743 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=7, line=930 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...
    $ jstack -F
    ...
    Thread 4074: (state = IN_JAVA)
  • org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95
    (Compiled frame; information may be imprecise)

org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=59, line=933 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...

I looked into InternalIndexShard$EngineRefresher#run() and
InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
If a thread is running in EngineRefresher#run() method when
changing refresh_interval to -1, invoking
refreshScheduledFuture.cancel(false)
in ApplyRefreshSettings#onRefreshSettings() does not cancel
EngineRefresher thread. Moreover, the refresh_interval is
changed to -1 and EngineRefresher seems to be invoked
with no interval(the thread comsumes 100% CPU).

I think that a fix is to check if refreshInterval.millis() > 0.
For example, the fix is:

class EngineRefresher implements Runnable {
    @Override
    public void run() {
        // we check before if a refresh is needed, if not, we 

reschedule, otherwise, we fork, refresh, and then reschedule
if (!engine().refreshNeeded()) {
synchronized (mutex) {
if (state != IndexShardState.CLOSED &&
refreshInterval.millis() > 0) { // <== HERE
refreshScheduledFuture =
threadPool.schedule(refreshInterval, ThreadPool.Names.SAME, this);
}
}
return;
}

Could you check this problem?

Thanks,
shinsuke

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I agree, this looks like a bug. I'll open an issue ... thank you for
reporting!

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 15, 2014 at 2:40 AM, Shinsuke Sugaya shinsuke.sugaya@gmail.com
wrote:

Hi,

I encountered a problem in InternalIndexShard#EngineRefresher.
The problem is, 1 core consumed 100% CPU resource when changing
refresh_interval to -1.

In ES 1.3, I took a thread dump twice as below:

$ top -n1 -b -H
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4074 elastics 20 0 9.9g 5.8g 316m R 99.6 18.5 53:19.94 java
...
$ jstack -F
...
Thread 4074: (state = IN_JAVA)

  • org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226
    (Compiled frame; information may be imprecise)

org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
@bci=4, line=128 (Compiled frame)

  • org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object)
    @bci=5, line=58 (Compiled frame)
  • org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100
    (Compiled frame)
  • org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1,
    line=142 (Compiled frame)
  • org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded()
    @bci=11, line=743 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=7, line=930 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...
    $ jstack -F
    ...
    Thread 4074: (state = IN_JAVA)
  • org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95
    (Compiled frame; information may be imprecise)

org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=59, line=933 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...

I looked into InternalIndexShard$EngineRefresher#run() and
InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
If a thread is running in EngineRefresher#run() method when
changing refresh_interval to -1, invoking
refreshScheduledFuture.cancel(false)
in ApplyRefreshSettings#onRefreshSettings() does not cancel
EngineRefresher thread. Moreover, the refresh_interval is
changed to -1 and EngineRefresher seems to be invoked
with no interval(the thread comsumes 100% CPU).

I think that a fix is to check if refreshInterval.millis() > 0.
For example, the fix is:

class EngineRefresher implements Runnable {
    @Override
    public void run() {
        // we check before if a refresh is needed, if not, we

reschedule, otherwise, we fork, refresh, and then reschedule
if (!engine().refreshNeeded()) {
synchronized (mutex) {
if (state != IndexShardState.CLOSED &&
refreshInterval.millis() > 0) { // <== HERE
refreshScheduledFuture =
threadPool.schedule(refreshInterval, ThreadPool.Names.SAME, this);
}
}
return;
}

Could you check this problem?

Thanks,
shinsuke

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcshkeefk%3DXeXFry7Y5-JGJjBN9iDP8riug9qQfDx%3D5sw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

OK this will be fixed in the next ES release:

Thank you for reporting this.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 15, 2014 at 2:40 AM, Shinsuke Sugaya shinsuke.sugaya@gmail.com
wrote:

Hi,

I encountered a problem in InternalIndexShard#EngineRefresher.
The problem is, 1 core consumed 100% CPU resource when changing
refresh_interval to -1.

In ES 1.3, I took a thread dump twice as below:

$ top -n1 -b -H
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4074 elastics 20 0 9.9g 5.8g 316m R 99.6 18.5 53:19.94 java
...
$ jstack -F
...
Thread 4074: (state = IN_JAVA)

  • org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226
    (Compiled frame; information may be imprecise)

org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
@bci=4, line=128 (Compiled frame)

  • org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object)
    @bci=5, line=58 (Compiled frame)
  • org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100
    (Compiled frame)
  • org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1,
    line=142 (Compiled frame)
  • org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded()
    @bci=11, line=743 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=7, line=930 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...
    $ jstack -F
    ...
    Thread 4074: (state = IN_JAVA)
  • org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95
    (Compiled frame; information may be imprecise)

org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=59, line=933 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...

I looked into InternalIndexShard$EngineRefresher#run() and
InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
If a thread is running in EngineRefresher#run() method when
changing refresh_interval to -1, invoking
refreshScheduledFuture.cancel(false)
in ApplyRefreshSettings#onRefreshSettings() does not cancel
EngineRefresher thread. Moreover, the refresh_interval is
changed to -1 and EngineRefresher seems to be invoked
with no interval(the thread comsumes 100% CPU).

I think that a fix is to check if refreshInterval.millis() > 0.
For example, the fix is:

class EngineRefresher implements Runnable {
    @Override
    public void run() {
        // we check before if a refresh is needed, if not, we

reschedule, otherwise, we fork, refresh, and then reschedule
if (!engine().refreshNeeded()) {
synchronized (mutex) {
if (state != IndexShardState.CLOSED &&
refreshInterval.millis() > 0) { // <== HERE
refreshScheduledFuture =
threadPool.schedule(refreshInterval, ThreadPool.Names.SAME, this);
}
}
return;
}

Could you check this problem?

Thanks,
shinsuke

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRfnMdhLBMbL6CU68XQ4riSd08o3BQWCy2afo1R-HJBdHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thank you for fixing it!

-shinsuke

On Wednesday, October 15, 2014 5:52:15 PM UTC+9, Michael McCandless wrote:

OK this will be fixed in the next ES release:
Only schedule another refresh if `refresh_interval` is positive by mikemccand · Pull Request #8087 · elastic/elasticsearch · GitHub

Thank you for reporting this.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 15, 2014 at 2:40 AM, Shinsuke Sugaya <shinsuk...@gmail.com
<javascript:>> wrote:

Hi,

I encountered a problem in InternalIndexShard#EngineRefresher.
The problem is, 1 core consumed 100% CPU resource when changing
refresh_interval to -1.

In ES 1.3, I took a thread dump twice as below:

$ top -n1 -b -H
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4074 elastics 20 0 9.9g 5.8g 316m R 99.6 18.5 53:19.94 java
...
$ jstack -F
...
Thread 4074: (state = IN_JAVA)

  • org.apache.lucene.index.IndexReader.tryIncRef() @bci=23, line=226
    (Compiled frame; information may be imprecise)

org.apache.lucene.search.SearcherManager.tryIncRef(org.apache.lucene.search.IndexSearcher)
@bci=4, line=128 (Compiled frame)

  • org.apache.lucene.search.SearcherManager.tryIncRef(java.lang.Object)
    @bci=5, line=58 (Compiled frame)
  • org.apache.lucene.search.ReferenceManager.acquire() @bci=21, line=100
    (Compiled frame)
  • org.apache.lucene.search.SearcherManager.isSearcherCurrent() @bci=1,
    line=142 (Compiled frame)
  • org.elasticsearch.index.engine.internal.InternalEngine.refreshNeeded()
    @bci=11, line=743 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=7, line=930 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled
    frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...
    $ jstack -F
    ...
    Thread 4074: (state = IN_JAVA)
  • org.elasticsearch.common.unit.TimeValue.millis() @bci=8, line=95
    (Compiled frame; information may be imprecise)

org.elasticsearch.threadpool.ThreadPool.schedule(org.elasticsearch.common.unit.TimeValue,
java.lang.String, java.lang.Runnable) @bci=30, line=229 (Compiled frame)

org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher.run()
@bci=59, line=933 (Compiled frame)

  • java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511
    (Compiled frame)
  • java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled
    frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
@bci=1, line=180 (Compiled frame)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()
@bci=30, line=293 (Compiled frame)

java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)

  • java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
    (Interpreted frame)
  • java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
    ...

I looked into InternalIndexShard$EngineRefresher#run() and
InternalIndexShard$ApplyRefreshSettings#onRefreshSettings().
If a thread is running in EngineRefresher#run() method when
changing refresh_interval to -1, invoking
refreshScheduledFuture.cancel(false)
in ApplyRefreshSettings#onRefreshSettings() does not cancel
EngineRefresher thread. Moreover, the refresh_interval is
changed to -1 and EngineRefresher seems to be invoked
with no interval(the thread comsumes 100% CPU).

I think that a fix is to check if refreshInterval.millis() > 0.
For example, the fix is:

class EngineRefresher implements Runnable {
    @Override
    public void run() {
        // we check before if a refresh is needed, if not, we 

reschedule, otherwise, we fork, refresh, and then reschedule
if (!engine().refreshNeeded()) {
synchronized (mutex) {
if (state != IndexShardState.CLOSED &&
refreshInterval.millis() > 0) { // <== HERE
refreshScheduledFuture =
threadPool.schedule(refreshInterval, ThreadPool.Names.SAME, this);
}
}
return;
}

Could you check this problem?

Thanks,
shinsuke

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ffd41678-4945-46ce-9405-c165e298d24a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/803ca2e9-9326-401f-9fb1-a64efcff6066%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.