Heavy CPU usage in APM Agents when APM-servers goes down

pabloborh · January 15, 2019, 7:36am

Hi!

Yesterday i saw in one of my testing environment that agents was consuming a lot of CPU because they was trying to reconnect to APM Server (was down)

Is it possible configure reconnection options to avoid that heavy load?

Thanks!

PD: Java agent v1.3.0

felixbarny · January 15, 2019, 7:52am

Hi and thanks a lot for the report.

Do you have any more details like the frequency of reconnections, where the CPU usage exactly stems from and the agent logs? The Java agent uses an increasing backoff so that it only tries to reconnect every 36 seconds after a while.

Thanks,
Felix

pabloborh · January 15, 2019, 8:04am

This was the CPU graphic. I'm finding logs...

I'm trying to reproduce a similar scenario.

PD: I see now a different error type. Yesterday i saw

[apm-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type METRICS with this error: connect timed out

And today (CPU is OK)

ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type TRANSACTION with this error: connect timed out

Reconnection time is 36s like you said.

felixbarny · January 15, 2019, 8:26am

Are you certain that the high CPU usage was caused by the agent? Or could the cause be somewhere else, maybe because not only the APM Server had an outage.

I just ran some benchmarks without an APM Server and I couldn't reproduce the high CPU usage.

pabloborh · January 15, 2019, 8:31am

I can't reproduce the same behaviour, now CPU is fine. Yesterdat APM server was crashing because of Elasticsearch was crashing with the "No shards available or All shards failed" Error.

I'm trying to get same behaviour. When APM Server recovered instantaneously CPU load went down.

Actually I can't do it. Maybe was some side-effect. If i get the same CPU load i will report you.

Thanks for reply and sorry for disturb!

felixbarny · January 15, 2019, 8:35am

No worries! Again, thanks for the feedback.

Could the high CPU usage be caused by the Elasticsearch or APM Server pods, rather than your application/the agent?

pabloborh · January 15, 2019, 8:39am

Could the high CPU usage be caused by the Elasticsearch or APM Server pods, rather than your application/the agent?

Pods with heavy CPU load was our java micro-services (they have the agent). It could also be interesting to say that AWS CPU credits was 0 (t2 unlimited was enabled.)

system · February 5, 2019, 4:39am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Induced CPU load by APM nodejs agent if elastic server saturates APM nodejs	3	419	June 1, 2021
Very high CPU usage on some projects APM php	3	556	March 30, 2023
High CPU usage with Python agent APM	4	678	June 18, 2018
Java APM Agent, System CPU reporting with Java 21 APM docker , java	5	606	February 13, 2024
APM server tuning for heavy workload APM java , server	12	1099	March 23, 2023

Heavy CPU usage in APM Agents when APM-servers goes down

Related topics