No thread pool info on dashboard of marvel 2.1?

monitoring

(Makeyang) #1

I'd like to confirm cause it is a very usefull info for my use case


(Mark Walkom) #2

That is correct, we're reworking all of the graphs to make sure we don't overload users, as Marvel 1 was a little crazy at times!

I'll let the team know about this though, as we are always after feedback on what is important.


(Makeyang) #3

thanks.
hope u guys can take it back ASAP


(Steve Kearns) #4

Can you help clarify what you were using the threadpool information for in Marvel 1.x? For example, what specific issues would you diagnose, and with which charts?

Thanks for sharing!


(Makeyang) #5

my server is write heavy server. I have to check the bulk thread size, bulk queue size, reject size etc.


(Dan Jasek) #6

I would like to see queue and reject sizes in the new Marvel as well.
In the past, I have found that monitoring queue size can provide a great early indicator that we are reaching the limits of performance of the cluster.

It would also be nice to have a graph of the CPU load from the various types of work. Something like the data provided by the hot_threads call. I am thinking an area chart showing % CPU used by search, index, management, etc.


(Chris Earle) #7

I would like to see queue and reject sizes in the new Marvel as well.

We hear you. While the next releases of Marvel (2.4 or X-Pack Monitoring 5.0) will not show the threadpool information directly, it does pull out the most important threadpool queue and rejection counts. As a result, you can (and we will) make a standard Kibana dashboard that visualizes some of the data that we're not yet displaying. For Marvel 2.3 and 2.4, you can find this by querying the /.marvel-es-1-*/node_stats index/type.

In the not-too-distant future, we do expect to begin visualizing this type of information and it's already in the works.

It would also be nice to have a graph of the CPU load from the various types of work. Something like the data provided by the hot_threads call.

This sounds very interesting, but I do worry about the expense and accuracy of doing it given the relative infrequency of the Marvel agent's polling (10s by default). Perhaps the CPU mixed with the forthcoming _nodes/usage API may prove to be useful here though.


(Dan Jasek) #8

Sounds good. Thanks.

I have already setup my own visualizations in Kibana for the reject count; looking forward to being able to do the same with queue sizes.

Yeah, I have the pleasure of not needing to figure out how to make it work. :slight_smile:
The usage API looks interesting. I don't know if a simple count will be enough though, relocating a 1 GB shard does not use the same resources as relocating a 1 MB shard.
Also, the important bit is the relative resource expense of, for example, the relocations going on compared to the queries being processed, which would be challenging to get right.

Just spit-balling here, but it looks like hot_threads uses ThreadMXBean.getThreadCpuTime to calculate this info, which I don't think is too expensive. It might be reasonable to run this regularly and track CPU time usage, grouped into job type buckets. Tracking CPU time as threads enter/exit the thread pools, and running this every 10s doesn't seem ridiculous on the face of it.


(system) #9