APM java agent causes my service memory to be full and the service crashes

If you are asking about a problem you are experiencing, please use the following template, as it will help us help you. If you have a different problem, please delete all of this text :slight_smile:

Kibana version: 7.7.0

Elasticsearch version: 7.7.0

APM Server version:7.7.0

APM Agent language and version:apm java agent , 1.16

Browser version:chrome
APM-SERVER config:

  host: ""
    enabled: true
      limit: 1000
      lru_size: 100000
    allow_origins : ['*']
    enabled: "false"
    events: 4096
    flush.min_events: 2048
setup.template.enabled: true
setup.template.name: "apm-%{[observer.version]}"
setup.template.pattern: "apm-%{[observer.version]}-*"
setup.template.fields: "${path.config}/fields.yml"
    number_of_shards: 5
    codec: best_compression
    number_of_routing_shards: 30
    mapping.total_fields.limit: 2000
  hosts: [""]
  compression_level: 1
  username: "elastic"
  password: "461e.com"
  worker: 2
    - index: "apm-%{[observer.version]}-sourcemap"
        processor.event: "sourcemap"
    - index: "apm-%{[observer.version]}-error-%{+yyyy.MM.dd}"
        processor.event: "error"
    - index: "apm-%{[observer.version]}-transaction-%{+yyyy.MM.dd}"
        processor.event: "transaction"
    - index: "apm-%{[observer.version]}-span-%{+yyyy.MM.dd}"
        processor.event: "span"
    - index: "apm-%{[observer.version]}-metric-%{+yyyy.MM.dd}"
        processor.event: "metric"
    - index: "apm-%{[observer.version]}-onboarding-%{+yyyy.MM.dd}"
        processor.event: "onboarding"
  bulk_max_size: 20480

We have put APM into production environment. Recently, we found a serious problem. APM java agent will cause the server memory and CPU to be full when the request is very large.But if apm java agent is not deployed, there will be no such problem.I reproduced this problem in the development environment.
I use the ab command to call my interface crazy.

ab -n 200000 -c 4000 ''

At this time, it is found that the memory will rise, and then all the memory will be used. The CPU will also be full. Eventually, the service crashes and calls this interface becomes very slow.
I used jmap to dump the memory for analysis.

 jmap -dump:live,format=b,file=/home/1.hprof 1

Use mat to analyze the memory. The result is shown in the image.

Memory is used by co.elastic.apm.agent.grpc.helper.GrpcHelperImpl

More detailed information

These occupied memory will not be released even if you stop the stress test. You need to restart the server

Thanks for reporting!

This seems to be related to a gRPC instrumentation related memory leak issue we are already aware of and looking into.
In order to verify, please set the disable_instrumentations config option to grpc and see if the problem is reproduced.
If not, please watch this GitHub issue, so you get notified once it is resolved.

Yes, I know this is caused by the call of grpc. If I put

elastic.apm.disable_instrumentations = grpc 

I canโ€™t collect the GRPC information of the service.I just want to collect the GRPC request information of my application

That's understood, I didn't suggest dropping support for gRPC, as I wrote - we are looking into this. I only asked that you verify that there is no any observable issue when this instrumentation is off.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.