Description:
I'm using Elastic APM to monitor a Java application deployed on a Kubernetes pod. When the service initially experiences CPU stress, the APM TRACER
correctly switches to the PAUSED
state due to the circuit breaker. However, even after the CPU load falls below the configured threshold, the TRACER
remains in the PAUSED
state, and traces and metrics are not captured.
Expected Behavior:
The APM tracer should automatically switch back to the RUNNING
state once the CPU load returns to normal levels. This would allow APM to resume capturing traces for the service.
Current Behavior:
- The CPU load initially spikes and triggers the circuit breaker, pausing the tracer.
- The CPU load subsequently returns to normal levels, but the tracer remains paused.
- Traces are not captured while the tracer is paused.
Logs:
2024-01-30 14:37:00,511 [elastic-apm-circuit-breaker] INFO co.elastic.apm.agent.impl.circuitbreaker.CircuitBreaker - Stress detected by co.elastic.apm.agent.impl.circuitbreaker.SystemCpuStressMonitor: Latest system CPU load value measured is 1.0. This is the 20th consecutive measurement that crossed the configured stress threshold - 0.95, which indicates this host is under CPU stress. 2024-01-30 14:37:00,518 [elastic-apm-circuit-breaker] INFO co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to PAUSED state
Environment:
- Kibana version: 8.8.1
- Elasticsearch version: 8.8.0
- APM Server version: 8.8.3
- APM Agent language and version: (Please specify language and version)
Additional Information:
as seen in picture cpu stress subsides
- stress_monitor_gc_relief_threshold = 0.9
- stress_monitor_gc_stress_threshold = 0.95
Why is the APM tracer not automatically switching back to the RUNNING
state in k8 after the CPU stress subsides? Is there any additional configuration or troubleshooting steps I can take to address this issue?