Stats api is slow sometimes

bdharrington7 · January 28, 2016, 1:12am

Hi, We've been seeing strange issues in our cluster for a while now, and I'm hoping this is the harbinger: Sometimes when loading the stats api (localhost:9200/_nodes/stats?all=true) it can take upwards of 5 seconds to 5 minutes to return. I thought maybe this was a cold JVM issue, but throughout testing, I can let it sit for 10 seconds or 10 minutes, and sometimes it would be slow, and sometimes be very responsive. I've checked top for any high cpu usage, but nothing looks strange from there. Here's my setup:

Elasticsearch 1.7.4
Java 1.8.0_45
Running on a Centos 6.4 machine 15 GB RAM, hosted on Openstack
6 data (acting as clients), 3 master

The cluster is barely used, has 100 docs on it, no one querying. Has anyone else experienced this?

jasontedor · January 28, 2016, 2:01am

When it's hanging, are you able to get a report from the hot threads API, or can you take a stack dump (with jstack) of all of the nodes on your cluster?

How much heap have you allocated to the Elasticsearch process?

bdharrington7 · January 29, 2016, 3:57am

Each node has about half the machine's memory allocated to the Elasticsearch heap.

bdharrington7 · January 29, 2016, 3:57am

Here's a couple queries to hot_threads during a time it hangs:

[brian@data-04 ~]$ curl -XGET localhost:9200/_nodes/_local/hot_threads
::: [data-04][ascqiAteQRG65CIk78FMbg][data-04.novalocal][inet[/10.xxx.xx.191:9300]]{http.enabled=true, master=false}
   Hot threads at 2016-01-29T00:22:47.680Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.0% (167.3micros out of 500ms) cpu usage by thread 'elasticsearch[data-04][transport_client_timer][T#1]{Hashed wheel timer #1}'
     10/10 snapshots sharing following 5 elements
       java.lang.Thread.sleep(Native Method)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
       org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
       java.lang.Thread.run(Thread.java:745)

[brian@data-04 ~]$ curl -XGET localhost:9200/_nodes/_local/hot_threads
::: [data-04][ascqiAteQRG65CIk78FMbg][data-04.novalocal][inet[/10.xxx.xx.191:9300]]{http.enabled=true, master=false}
   Hot threads at 2016-01-29T00:23:00.920Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.0% (205.8micros out of 500ms) cpu usage by thread 'elasticsearch[data-04][transport_client_timer][T#1]{Hashed wheel timer #1}'
     10/10 snapshots sharing following 5 elements
       java.lang.Thread.sleep(Native Method)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
       org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
       java.lang.Thread.run(Thread.java:745)

[brian@data-04 ~]$ curl -XGET localhost:9200/_nodes/_local/hot_threads
::: [data-04][ascqiAteQRG65CIk78FMbg][data-04.novalocal][inet[/10.xxx.xx.191:9300]]{http.enabled=true, master=false}
   Hot threads at 2016-01-29T00:23:08.391Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.0% (161.8micros out of 500ms) cpu usage by thread 'elasticsearch[data-04][transport_client_timer][T#1]{Hashed wheel timer #1}'
     10/10 snapshots sharing following 5 elements
       java.lang.Thread.sleep(Native Method)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
       org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
       java.lang.Thread.run(Thread.java:745)

[brian@data-04 ~]$ curl -XGET localhost:9200/_nodes/_local/hot_threads
::: [data-04][ascqiAteQRG65CIk78FMbg][data-04.novalocal][inet[/10.xxx.xx.191:9300]]{http.enabled=true, master=false}
   Hot threads at 2016-01-29T00:23:41.041Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

[brian@data-04 ~]$ curl -XGET localhost:9200/_nodes/_local/hot_threads
::: [data-04][ascqiAteQRG65CIk78FMbg][data-04.novalocal][inet[/10.xxx.xx.191:9300]]{http.enabled=true, master=false}
   Hot threads at 2016-01-29T00:23:45.638Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.0% (186.9micros out of 500ms) cpu usage by thread 'elasticsearch[data-04][transport_client_timer][T#1]{Hashed wheel timer #1}'
     10/10 snapshots sharing following 5 elements
       java.lang.Thread.sleep(Native Method)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
       org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
       java.lang.Thread.run(Thread.java:745)

[brian@data-04 ~]$ curl -XGET localhost:9200/_nodes/_local/hot_threads
::: [data-04][ascqiAteQRG65CIk78FMbg][data-04.novalocal][inet[/10.xxx.xx.191:9300]]{http.enabled=true, master=false}
   Hot threads at 2016-01-29T00:23:50.133Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
    0.0% (188.4micros out of 500ms) cpu usage by thread 'elasticsearch[data-04][transport_client_timer][T#1]{Hashed wheel timer #1}'
     10/10 snapshots sharing following 5 elements
       java.lang.Thread.sleep(Native Method)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:445)
       org.elasticsearch.common.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:364)
       org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
       java.lang.Thread.run(Thread.java:745)

jasontedor · January 29, 2016, 5:44am

Okay, that's fine. Any messages in the logs about GC durations? Are you able to get a stack dump?

bdharrington7 · February 2, 2016, 8:24pm

The hot_threads output is above, but there are no problems with long GC times, additionally there is not a lot of activity on the cluster so there wouldn't be any huge GCs

Topic		Replies	Views
Why is my Elasticseach API(_nodes/stats) getting stuck? Elasticsearch	3	393	January 11, 2019
ElasticSearch stats API taking too long to respond Elasticsearch	2	1033	February 14, 2018
Performance Issue with Elasticsearch 7.11.1 Cluster Elasticsearch	1	325	May 17, 2021
Elastic High Response Time Elasticsearch	3	495	August 24, 2021
Elasticsearch getting stucked after few iterations Elasticsearch	5	923	July 5, 2017

Stats api is slow sometimes

Related topics