Hi, I am using On premise elastic stack 7.9.0. And recently we are facing outofmemory issue on one of the java agents.
The APM CallTrace objects are taking more than 2GB which is causing the issue.
Also, we are seeing some gaps in the "Output Event Rate" graph in APM Stack monitoring.
If a query/process is very heavy (causing OOM errors), the ES Stack Monitoring plugin will start throttling collection rate thus resulting in gaps on the chart. Couple of things we'll need to figure out first.
During this period when the chart is showing gaps, are there any logs/errors in the ES/Kibana console?
Have you tried identifying the query causing the OOM? (usually the one that is the slowest). You can do this via:
Try running the output rate query independently and see if it occasionally times out (or if the result also has gaps). Be sure to replace your own cluster_uuid:
One thing to also try is different time ranges, so instead of the default 1h ago try things like 15m or 6h etc. This way we can figure out if it's a max bucket issue
This might also be because the cluster resources are under provisioned. Have you tried increasing nodes/memory (JVM)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.