APM Performance impact on .netcore windows service

I need some help identifying the cause of a significant performance impact on a windows service after instrumenting with APM.

So, we have a .netcore 3.1 windows service running in a micro service architecture with RabbitMQ as our message broker. Under normal circumstances this service is processing around 20 messages a second but after setting up Elastic APM, performance has dropped to around 3-5 messages per second.

APM has been setup using the Public Api as described here: Public API | APM .NET Agent Reference [master] | Elastic, and just 2 transactions and 3 spans are setup together with the SqlClientDiagnosticSubscriber.

Instrumentation of APM is the only change to the code when comparing performance based on service consumption from RabbitMQ. And I should say that APM itself and reporting to the APM server is working fine with metrics being captured etc.

A few notes on what we tried and what we observe:

  • Service has been running with and without APM for longer stretches of time, so this is not related to periodic irregularities in message sizes

  • We have another web api setup with the static APM API and it's working like a charm, with no observable performance impact. This web api is reporting to the same APM server

  • No errors are being reported from the APM server, and resource allocation (primarily memory) has been done, so we're not experiencing memory swap or similar

  • The windows service in question is using a bit more CPU and memory on the server, but it's not maxing out or anything

  • The APM configuration for the service looks like this:

       ```"ElasticApm": {
          "Environment": "Staging",
          "TransactionSampleRate": 1.0,
          "StackTraceLimit": 5,
          "SpanFramesMinDuration": "0ms",
          "ServerUrls": "serverurl"
        }```
    
  • APM server is running Elastic version 7.12

  • APM Agent installed in service is version 1.8.1

  • I haven't adjusted the TransactionSampleRate as I'm a bit hesitant about scaling down on what I see as the core functionality of APM - capturing all transactions. Not necessarily with full stack traces, but at least the basic metrics for all transactions.

APM server config can be supplied if needed,

Any ideas on what to look for or adjust?
Thanks!

Hi @whitefield,

nothing really stands out for me here. We have a docs on what you can do, you already set SpanFramesMinDuration to 0, which usually reduces the overhead by a lot.

Can you attach a CPU sample? For example with PerfView, or with some other tool.

Hi @GregKalapos ,

Thanks for your reply. I'll look into creating a CPU sample. Meanwhile, I've added what we have in the server config below, obviously omitting all the settings we're not using. Do you see anything unusual or missing here? I'm not sure if the APM server itself can become a bottleneck without it throwing errors?

#================================= General =================================
queue.mem.events: 6000

#-------------------------- Elasticsearch output --------------------------
output.elasticsearch.enabled: true
output.elasticsearch.worker: 2
output.elasticsearch.bulk_max_size: 3000

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.