Slow Response Time on APM Indices

yozel · July 12, 2019, 8:48am

Hi all,

We started using Elastic APM in the company and we have a bottleneck that causes Elasticsearch queries too long.

We have an Elasticsearch cluster with 8 nodes. Node types are on default. Each node has 16 vCPUs, 60GB (30GB heap size) memory and 2TB SSD data disk.
There are 2 APM servers with 16 vCPUs and 15GB memory.
Additionally, there is a Kibana server with 1 vCPU and 3.75GB memory.

In the Elasticsearch cluster, we currently have 80 shards (and no replicas) on transactions and errors.
With this calculation, our aim is keeping the size between 20GB-40GB for each shard.
While making a search on Kibana, all the Elasticsearch nodes' vCPUs touch the peak.

As other APM configurations;

max_event_size is about 3mb
queue.mem.events: 10240000
output.elasticsearch.workers: 512
output.elasticsearch.bulk_max_size: 20000
setup.template.settings.index.number_of_routing_shards: 480
setup.template.settings.index.refresh_interval: 180s

Our daily APM data is ~2TB.

Our problem is that we can't get a response when we want to see the APM dashboard. We have investigated the queries that Kibana APM Dashboard sends and selected an example query.

For example, when we want to see the data for the last 2 hours, it takes over 40 seconds. You can find the profiling results of this example search query on the link below.

gist.github.com

https://gist.github.com/yozel/a6cdcbbb08cad0539e3fd2830e5845ee

profile.json

{
  "took" : 40211,
  "timed_out" : false,
  "_shards" : {
    "total" : 323,
    "successful" : 323,
    "skipped" : 162,
    "failed" : 0
  },
  "hits" : {

This file has been truncated. show original

query.json

shards.txt

We can't change the query since we don't have any control over the query that Kibana APM Dashboard sends.

Do you have any suggestion about infrastructure and configurations?

Thanks in advance.

system · August 9, 2019, 8:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can I control max concurrent shard request about APM dashboard? Kibana	12	804	January 1, 2019
Kibana APM App pages are TOO SLOW APM ui	9	2402	April 8, 2020
APM UI Kibana Internal Server Error APM ui	21	2367	April 15, 2020
APM transaction data not compressed properly APM java , server	4	896	April 8, 2020
Elastic APM High Response Errors Rate APM ruby , server	1	1250	July 30, 2019

Slow Response Time on APM Indices

Related topics