All dashboard visualisations in 'stalled' state for 30s to 1m

First post here, hoping to get some assistance with this issue, or tell me if this is normal behavior.

Background:

  • We have a cluster of 3 nodes with 32Gb each running on VM / docker. Each node has 16Gb allocated to Java heap.
  • Our ingest is about 10Gb a day into a daily index with three primary shards, and 1 replica.
  • Indexing seems fine, with a rate of about 500/s across all shards, with a latency of around 0.7ms
  • We have one dashboard with about 18 visualisations (9 bar graphs, 9 tables) on it
  • When viewing the dashboard, the search rate, and search latency can't be seen because kibana stops responding (i know we should monitor from a different cluster for exactly this reason).
  • Rough numbers during report generation are client response times of around 25-30s, HTTP connections around 80, client requests around 175

Issue:
The dashboard takes about 1 minute to load fully, and there is only 10 days (10x10Gb of logs) of data to aggregate. We want this to be a monthly report, so if it takes 3 minutes to do 30 days of data (300Gb), it seems too long.

Observations:
When looking at the stack monitoring, the system load for each node goes to about 4-6 for each node during report creation.
When looking at the Network waterfall for the data requests - the stalled state increases from first visualisation to last- ie the first visualisation is in a stalled state for 9 seconds and the last in a stalled state for 48 seconds. The TTFB is about 15s across the board:


Queued at 384.48 ms
Started at 385.46 ms
Resource Scheduling TIME Queueing
​0.98 ms
Connection Start TIME Stalled
​47.49 s
Request/Response TIME Request sent
​0.17 ms
Waiting (TTFB) ​
16.41 s
Content Download

Firstly, is this normal behavior with the stalled state?
Secondly, if not, how to determine whats causing this stalled state?

Cheers
Rob.

What version of kibana are you running? What browser are you running? What do your visualizations look like? Are they rendering large amounts of bars with lots of splits?

Hi
On 7.3.1 across the stack.
Using Chrome, or IE produces the same "stalled" state time frames.

I have done more investigation, and in case others have tried to do this, ill post what the issue was:
At the top of the report i have a drop down list that shows the client list to filter the dashboard by client.
I used the markdown feature of tsvb to insert that client name in the description text for each visualisation on the dashboard.
The text embedded with the client name (ironically) is what is causing the dashboard to take minutes to load.
Once I removed the client name lookup from the text the dashboard goes down to a respectable 30s or so to load.

I would appreciate if anyone has a better way of doing this?
I tried creating an index just with the clients names and using that as a lookup for the drop down list and filter for the dashboard. Even though the client name data comes from a different index, I used the same variable name, and it seems to filter the visuals correctly.
Is there a better way though?
Thanks,
Rob.