Making Kibana faster

I'm surprised that I haven't been able to find much information on making Kibana run faster.

So currently, my dashboards in Kibana are running very slow and I'm pretty sure it is on the kibana side of things. For my 8 visualizations, there is an average Query Duration of about 60ms and a Request Duration of about 38000ms which tells me that Kibana is the bottleneck and I can't figure out how to make this faster. When I look at network profiler, _msearch takes about 35000 of those seconds. When I look at the CPU and RAM usage of Kibana, they are both low.(I'm no expert so I don't really know what else to look at)

Does Kibana need more CPU power or RAM? If so, how do I give it more? How can I make the Request Duration faster? (I've already checked all the visualizations and none are a bottleneck)

Currently I'm running ES 5.3.2, and I have 3 master nodes on separate machines, and 1 node with a coordinating node and Kibana on the same machine. All the machines have 2 cores and 12 GB of RAM.

_msearch is an Elasticsearch API. What Kibana is doing when it calls this is form a bunch of requests, pass them to Elasticsearch, and render the results into visualizations. There is almost no work that the Kibana server does when this happens, so giving more CPU or RAM to Kibana won't help.

There are a few things that could potentially lead to this:

When you're looking at "query duration," what this should mean is looking at the "took" times for each result in the _msearch. Then I assume that "request duration" is the round-trip time for the _msearch, which you can find in the browser's dev tools. Here is an example measurement:

The took times total up to 19479ms.

When we look at the timing for this request, it turns out to be 11.57 seconds.

How can the request time be less than the total took times? The answer is because ES processes queries in the _msearch in parallel, so what we're primarily worried about is the timing for the one that took the longest, which for me is 11.37 seconds. That isn't too far away from the 11.57 seconds of the total request. Even though average took time is 2.1 seconds, the request waits on the full payload, so average took time isn't very useful.

If you could share some information like this in this thread, it would be really helpful to understand better what kind of problem we're facing.

4 Likes

I can't tell you how much I appreciate this thorough response.

So just as you predicted, I had a saved search on my dashboard and that was a huge bottleneck. I changed the discover:sampleSize to 10000 to better see the field counts on the left side panel in "discover" but I guess that isn't as important as a 6x improvement in performance! In a test, my queries went from 50s to 9s which is a fantastic improvement.

It was also super useful that you showed me how to go into the network profiler > _msearch > preview to see what the specific of durations are for the dash.

I've added to the git issue my use case and I appreciate your help a lot.

Thanks!

1 Like

Awesome!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.