TSVB dashboard - Performance issue with huge data

I have a performance issue with TSVB dashboard with 70 millions records and more (TSVB | Kibana Guide [8.13] | Elastic)
I have applied rollover index in ILM, so caching can be used for normal query (request_cache = true), but for TSVB, I have not and found the way to set it
Do you know:

Hi @david89 Welcome to the community.

Exactly what calculation / aggregation are you trying to calculate?

You can use filters to reduce dataset etc to only use documents that include the field you want to aggregate on.

Are you trying to do a cardinality on large cardinality data set? Here

Did you try lens?

Dear @stephenb , thank you for your attention
we are doing "sum" on some field
We must scan all records, so no filter can be applied
Do you have any suggest about caching or any approach?
"lens" => I will try it

@david89

What is the performance you are seeing?

What is the total size of the index?

What are the specs and JVM heap for the cluster?

You could also pre aggregate the Data...

Also look at

@stephenb , good morning, here are the detail information

What is the performance you are seeing?

=> My index has more than 400 fields, in one dashboard I need to load 10 TSVB charts which aggregate information from some fields
The time for loading 1 TSVB chart is short (~1s) but for 10 charts at time: error (more than 10s)

What is the total size of the index?
total size: 74 millions records ~ 70GB (cluster has 3 primary shard + 1 replica for each primary)

What are the specs and JVM heap for the cluster?
CPU: 14, RAM: 16G, SSD: 120G
=> this specs is good for total size = 40G (time for loading 10 charts is smaller than 10s)
but for more than 50G, there are errors
I would like to apply caching before thinking upgrade the specs (more CPU, RAM) because the cost is high

You could also pre aggregate the Data...

Also look at
=> thank you, I will take a look

@stephenb do you have any suggest for us
we load one chart fast, but for many charts (> 10) is very slow and fail

What version are you on?
How many nodes?
You are probably just overwhelming your cluster...

May be no simple answer

@stephenb here is the specification:
1 cluster: 3 nodes (all are data node, 1 master), 3 primary shard + 1 one replica (per shard)

And ES version: 8.12.0

What error is TSVB reporting?

Here is error when loading huge data (time > 3 weeks ago, if we choose < 2 week, there is not any error):

Error: Server Error

The server encountered a temporary error and could not complete your request.

Please try again in 30 seconds.

Interesting. How the stack deployed? Is it on premise or a cloud deployment?

It is ECK (Basic - free) on GKE (Google Kubernetes Engine)

Probably you need to check at the kubernetes side of things: is it using Ingress or a Load balancer?

1 Like