Thanks for reporting this
Are you running Elasticsearch with some limitations/configurations to its CPU and memory usage?
I would advise these 2 items as next steps:
- enable APM Server instrumentation, so you can collect APM Server metrics, including Go runtime metrics, in Elasticsearch; then you can build accurate monitoring for it.
- reduce the number of available threads for APM Server, by setting the environment variable GOMAXPROCS in the APM server environment, probably you can start with 32, since you set worker to 30, you could increase it but it depends how much Elasticsearch is hitting on CPU
When you say 60k events/sec, these are not requests per second correct?
We are talking about 60k instances of traces/metrics/logs event ingested in each second?
Can you please clarify this? If that's the case, the APM workload is not big.
But if Elasticsearch is also used for other ingestion, that might explain the slowness, probably the JVM is taking away all the CPU time.