Need help understanding hot threads

I have a 5 node cluster on kubernetes, 2 data nodes, 3 masters. This is sitting underneath graylog. For some reason, usually only 1 data node is extremely busy. And lately the load really high on it. I've grabbed the hot threads and pasted in link below. The specs are as follows for data nodes:
6 CPU cores, 16GB ram, 12GB heap - 1.15TB of total disk - 526GB available on each.

any suggestions on optimizing? this is running on es 6.8.3, JVM 12.0.2

Can you explain why you think you should optimize based on the hot threads output?

Judging from the lines containing cpu usage by thread this system does not look too busy to me.

that one single data node is churning, but i cant figure out what its really doing.. i assume its 'writing' ? normally my graylog instance can push out 5k-12k of messages per second. As soon as one of the data gets hit with 'high load' its down to 1000 messages per minute or less...
I am unable to figure out what is keeping the node busy where it cant process messages.
I am using the cerebro overview which is connected to the cluster and that is where i'm getting the load from. The cpu section doesnt seem busy.. but the load is really high, if that makes any sense.

This hot threads output contains two write threads that -- in all ten snapshots -- were observed to be busy doing durable writes (i.e. fsync(), assuming you're on Linux):

       java.base@12.0.2/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
       java.base@12.0.2/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:82)
       java.base@12.0.2/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:461)

I think that's consistent with high load and low CPU utilisation. Assuming this is writing to a directly-attached local disk, some common causes are that it is too slow to keep up with the write load, or that it is actually faulty. Check dmesg for error messages about this disk, and look at some OS-level stats using iostat. You can try benchmarking its raw IO performance with fio too.

correct, this is on linux on attached disk. However this issue does not occur right away, the nodes would run for 24-36 hours for example, this issue occurs again, only pushing 1k of messages a minute into the cluster. The "fix" i have been doing was reboot each of the data nodes, as soon as the cluster is green again it pumps out logs 11k again no problem. I've repeated this multiple times now, but there might be some other underlying issue. Is there other commands i can run to help debug what the node is doing?

Not really at the Elasticsearch level, no. It's syncing data to disk, but the details of what that means and why it takes as long as it does are not really something Elasticsearch can inspect. You can try calling hot threads with ?threads=99999 to get a dump of all the threads rather than just the hottest 3, although the file you shared only had 2 active threads so I don't think there's anything else going on.

I would suggest tracking the output of iostat to see what's different between the "running smoothly" and "far too slow" phases in terms of IO.

The first set of iostat info was when the node is at high load and barely writing messages to disk. These are on EKS cluster instances so the mounts are sometimes different, but i ran the same test on them. It looks to have some reading issues and thats why its queued up?

node A
[root@ip-xx ec2-user]# iostat -m -p nvme1n1
Linux 4.14.133-113.112.amzn2.x86_64 (ip-xxA)    03/16/2020      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.67    0.01    0.45    0.43    0.00   96.45

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme1n1          31.68         0.17         2.18      45521     589328

node B
[root@ip-xx ec2-user]# iostat -m -p nvme4n1
Linux 4.14.133-113.112.amzn2.x86_64 (ip-xxB)    03/16/2020      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.34    0.01    0.69    2.18    0.00   93.77

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme4n1          31.94         0.17         2.24      46871     600668


after reboot

[root@ip-xx ec2-user]# iostat -m -p nvme2n1
Linux 4.14.133-113.112.amzn2.x86_64 (ip-xxA)     03/16/2020      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.08    0.00    0.59    8.75    0.01   86.57

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme2n1          67.28         3.42         2.09       5341       3257

[root@ip-xx ec2-user]# iostat -m -p nvme5n1
Linux 4.14.133-113.112.amzn2.x86_64 (ip-xxB)    03/16/2020      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.89    0.01    1.11    9.16    0.01   83.82

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
nvme5n1          76.81         3.14         2.13       3343       2263

From man iostat:

The first report generated by the iostat command provides statistics concerning the time since the system was booted.

Probably not that useful -- you care about the recent performance only. So you're looking for something like iostat -m -p nvme2n1 5 to get updated stats every 5 seconds.

1 Like