.NET Agent is not capturing / sending spans to APM/ Elastic

Hi,

We are trying to capture tracing data from a .NET application into APM to help pinpoint performance problems, we can see HTTP requests listed under the application in Kibana, with URL and times, however we cannot see any transactions / spans / metadata within each of the URLs; we have existing applications working that are based upon Java, and we are getting transactions / spans, metadata, etcetera into APM / Elastic / Kibana. The .NET and Java applications are going into the same infrastructure / services so we do not believe it to be an infrastructure issue.

Kibana version: 6.8.0

Elasticsearch version: 6.8.22

APM Server version: 6.8.11

APM Agent language and version:

"user-agent": "apm-agent-dotnet/1.17.0+db0dc2f2389613571d7f18a674e14868e3a2c1ca (xxxxx) System.Net.Http/4.8.4084.0 .NET_Framework/4.8.4510.0"

Browser version: Chrome

Is there anything special in your setup? ALB in front of APM, but other JAVA-based services are working as expected.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

I expect to see spans within each of the URL entries capturing function / class / database calls within the application, yet I do not see any data under each of the URLs listed in the images.

Steps to reproduce:

  1. Deploy a .NET agent, point it at APM 6.8.11
  2. Navigate to Kibana 6.8.0, view the app APM entry to find no spans listed within

Agent Configuration

<?xml version="1.0" encoding="utf-8"?>
<!-- ... -->
<configuration>
    <!-- ... -->
    <appSettings>
        <!-- ... -->
        <add key="ElasticApm:ServerUrl" value="https://apm.xxxxxxxx.xxx:443" />
        <add key="ElasticApm:ServiceName" value="xxxxxxxx" />
        <add key="ElasticApm:Environment" value="development" />
        <!-- ... -->
    </appSettings>
    <!-- ... -->
</configuration>

Provide logs and/or server output (if relevant):

APM Agent log output

[2022-09-09 13:47:48.351 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} Service name provided in configuration is XXXXXXXX
[2022-09-09 13:47:48.354 +01:00][Info] - {AbstractConfigurationReader (FullFrameworkConfigReader)} The agent was started without a service version. The service version will be automatically discovered.
[2022-09-09 13:47:48.356 +01:00][Warning] - {AbstractConfigurationReader (FullFrameworkConfigReader)} Failed to discover service version, the service version will be omitted.
[2022-09-09 13:47:48.370 +01:00][Debug] - {SystemInfoHelper} No /proc/self/cgroup found - the agent will not report container id
[2022-09-09 13:47:48.390 +01:00][Debug] - {BackendCommUtils} Building HTTP client with BaseAddress: https://apm.xxxxxx.xxx/ for BackendCommComponentBase (PayloadSenderV2)...
[2022-09-09 13:47:48.421 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} GlobalLabels configuration option doesn't have a valid value - using default (empty map)
[2022-09-09 13:47:48.425 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} MaxQueueEventCount configuration option doesn't have a valid value - using default: 1000
[2022-09-09 13:47:48.427 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} MaxBatchEventCount configuration option doesn't have a valid value - using default: 10
[2022-09-09 13:47:48.429 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} FlushInterval configuration option doesn't have a valid value - using default: 00:00:10
[2022-09-09 13:47:48.434 +01:00][Debug] - {PayloadSenderV2} Using the following configuration options: Events intake API absolute URL: https://apm.xxxxxx.xxx/intake/v2/events, FlushInterval: 10s, MaxBatchEventCount: 10, MaxQueueEventCount: 1000
[2022-09-09 13:47:48.449 +01:00][Debug] - {BackendCommComponentBase (PayloadSenderV2)} Waiting for work loop started event...
[2022-09-09 13:47:48.456 +01:00][Debug] - {BackendCommComponentBase (PayloadSenderV2)} Signaling work loop started event...
[2022-09-09 13:47:48.458 +01:00][Debug] - {BackendCommComponentBase (PayloadSenderV2)} Work loop started signaled
[2022-09-09 13:47:48.464 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CentralConfig configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.466 +01:00][Debug] - {BackendCommUtils} Building HTTP client with BaseAddress: https://apm.xxxxxx.xxx/ for BackendCommComponentBase (CentralConfigurationFetcher)...
[2022-09-09 13:47:48.469 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CentralConfig configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.470 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CentralConfig configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.472 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CentralConfig configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.474 +01:00][Debug] - {CentralConfigurationFetcher} Central configuration feature is enabled because CentralConfig option's value is True (default value is True)
[2022-09-09 13:47:48.476 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CentralConfig configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.477 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CentralConfig configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.480 +01:00][Debug] - {CentralConfigurationFetcher} Combined absolute URL for APM Server get central configuration endpoint: `https://apm.xxxxxx.xxx/config/v1/agents?service.name=XXXXXXXX&service.environment=development'. Service: Service{Name: XXXXXXXX, Version: null, Environment: development, Runtime: Runtime{Name: .NET Framework, Version: 4.8.4510.0}, Framework: null, Agent: AgentC{Name: dotnet, Version: 1.17.0+db0dc2f2389613571d7f18a674e14868e3a2c1ca}, Language: null, Node: Node{ConfiguredName: null}}.
[2022-09-09 13:47:48.483 +01:00][Debug] - {BackendCommComponentBase (CentralConfigurationFetcher)} Waiting for work loop started event...
[2022-09-09 13:47:48.489 +01:00][Debug] - {BackendCommComponentBase (CentralConfigurationFetcher)} Signaling work loop started event...
[2022-09-09 13:47:48.491 +01:00][Debug] - {BackendCommComponentBase (CentralConfigurationFetcher)} Work loop started signaled
[2022-09-09 13:47:48.643 +01:00][Debug] - {CentralConfigurationFetcher} Exception was thrown while fetching configuration from APM Server and parsing it. ETag: `<null>'. URL: `https://apm.xxxxxx.xxx/config/v1/agents?service.name=XXXXXXXX&service.environment=development'. Apm Server base URL: `https://apm.xxxxxx.xxx/'. WaitInterval: 5m. dbgIterationsCount: 1.
+-> Request:
    Method: GET, RequestUri: 'https://apm.xxxxxx.xxx/config/v1/agents?service.name=XXXXXXXX&service.environment=development', Version: 1.1, Content: <null>, Headers:
    {
      User-Agent: apm-agent-dotnet/1.17.0+db0dc2f2389613571d7f18a674e14868e3a2c1ca
      User-Agent: (XXXXXXXX)
      User-Agent: System.Net.Http/4.8.4084.0
      User-Agent: .NET_Framework/4.8.4510.0
    }
+-> Response:
    StatusCode: 404, ReasonPhrase: 'Not Found', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
    {
      Connection: keep-alive
      Date: Fri, 09 Sep 2022 12:47:49 GMT
      Set-Cookie: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; Expires=Fri, 16 Sep 2022 12:47:49 GMT; Path=/; SameSite=None; Secure
      X-Content-Type-Options: nosniff
      Content-Length: 19
      Content-Type: text/plain; charset=utf-8
    }
+-> Response body [length: 19]:
    404 page not found
+-> Exception: Elastic.Apm.BackendComm.CentralConfig.CentralConfigurationFetcher+FailedToFetchConfigException: HTTP status code is Not Found (404) which most likely means that APM Server is an old (pre 7.3) version which doesn't support the central configuration endpoint
   at Elastic.Apm.BackendComm.CentralConfig.CentralConfigurationResponseParser.InterpretResponseStatusCode(HttpResponseMessage httpResponse, WaitInfoS waitInfo)
   at Elastic.Apm.BackendComm.CentralConfig.CentralConfigurationResponseParser.ParseHttpResponse(HttpResponseMessage httpResponse, String httpResponseBody)
   at Elastic.Apm.BackendComm.CentralConfig.CentralConfigurationFetcher.WorkLoopIteration()

[2022-09-09 13:47:48.649 +01:00][Info] - {MetricsCollector} Collecting metrics in 30000 milliseconds interval
[2022-09-09 13:47:48.655 +01:00][Debug] - {ElasticApmModule.#1} Found ASP.NET version: 4.8.4494.0
[2022-09-09 13:47:48.658 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} CaptureHeaders configuration option doesn't have a valid value - using default: True
[2022-09-09 13:47:48.661 +01:00][Debug] - {HttpDiagnosticListener} Current runtime is detected as Full Framework. RuntimeInformation.FrameworkDescription: .NET Framework 4.8.4510.0
[2022-09-09 13:47:48.675 +01:00][Debug] - {DiagnosticInitializer} Subscribed Elastic.Apm.DiagnosticListeners.HttpDiagnosticListenerFullFrameworkImpl to `System.Net.Http.Desktop' events source
[2022-09-09 13:47:48.677 +01:00][Debug] - {ElasticApmModule.#1} Initialized Agent singleton. .NET runtime: .NET Framework 4.8.4510.0; IIS: 10.0
[2022-09-09 13:47:48.744 +01:00][Warning] - {PayloadSenderV2} Failed parsing APM Server version - version string not available
[2022-09-09 13:47:49.383 +01:00][Debug] - {ElasticApmModule.#1} Incoming request processing started - starting trace...
[2022-09-09 13:47:49.392 +01:00][Debug] - {ElasticApmModule.#1} Incoming request doesn't have elastic-apm-traceparent header - it means request doesn't have incoming distributed tracing data
[2022-09-09 13:47:49.395 +01:00][Debug] - {ElasticApmModule.#1} Incoming request doesn't have valid incoming distributed tracing data - starting trace with new trace ID
[2022-09-09 13:47:49.402 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} No transaction sample rate provided. Defaulting to '1'
[2022-09-09 13:47:49.417 +01:00][Debug] - {Tracer} Starting Transaction{Id: 9eb9e8421f820212, TraceId: bd92ac0404fc88e5f1c10a02867b6999, ParentId: null, Name: GET /, Type: request, Outcome: Unknown, IsSampled: True}
[2022-09-09 13:47:49.497 +01:00][Debug] - {ElasticApmModule.#1} Incoming request processing finished - ending trace...
[2022-09-09 13:47:49.515 +01:00][Debug] - {PayloadSenderV2} Enqueued Transaction. newEventQueueCount: 1. MaxQueueEventCount: 1000. Transaction: Transaction{Id: 9eb9e8421f820212, TraceId: bd92ac0404fc88e5f1c10a02867b6999, ParentId: null, Name: GET /, Type: request, Outcome: Success, IsSampled: True}.
[2022-09-09 13:47:49.530 +01:00][Debug] - {ElasticApmModule.#1} Incoming request processing started - starting trace...
[2022-09-09 13:47:49.534 +01:00][Debug] - {ElasticApmModule.#1} Incoming request doesn't have elastic-apm-traceparent header - it means request doesn't have incoming distributed tracing data
[2022-09-09 13:47:49.537 +01:00][Debug] - {ElasticApmModule.#1} Incoming request doesn't have valid incoming distributed tracing data - starting trace with new trace ID
[2022-09-09 13:47:49.540 +01:00][Debug] - {AbstractConfigurationReader (FullFrameworkConfigReader)} No transaction sample rate provided. Defaulting to '1'
[2022-09-09 13:47:49.542 +01:00][Debug] - {Tracer} Starting Transaction{Id: d0258d9e52635600, TraceId: 878b26aefcbc4009e67da401698aa44f, ParentId: null, Name: GET /ADLogin.aspx, Type: request, Outcome: Unknown, IsSampled: True}
[2022-09-09 13:47:50.245 +01:00][Debug] - {ElasticApmModule.#1} Incoming request processing finished - ending trace...
[2022-09-09 13:47:50.247 +01:00][Debug] - {PayloadSenderV2} Enqueued Transaction. newEventQueueCount: 2. MaxQueueEventCount: 1000. Transaction: Transaction{Id: d0258d9e52635600, TraceId: 878b26aefcbc4009e67da401698aa44f, ParentId: null, Name: GET /ADLogin.aspx, Type: request, Outcome: Success, IsSampled: True}.

APM Server Output

2022-09-13T14:37:53.697Z	INFO	[request]	beater/common_handlers.go:272	handled request	{"request_id": "cb4d2f60-6cc8-4182-ac37-5df0c7e88263", "method": "POST", "URL": "/intake/v2/events", "content_length": 1199, "remote_address": "xxx.xxx.xxx.xxx", "user-agent": "apm-agent-dotnet/1.17.0+db0dc2f2389613571d7f18a674e14868e3a2c1ca (xxxxxxxxxx) System.Net.Http/4.8.4084.0 .NET_Framework/4.8.4510.0", "response_code": 202}

Hi @dwmri

one important thing to note here is that the .NET agent doesn't have a feature similar to the inferred spans feature in the Java agent.

Having said that, the .NET Agent would still capture spans for supported technologies (typically things like DB calls, outgoing HTTP calls and similar).

How did you turn on the agent? To me it seems you added the IIS module into the web.config - is that right?

If so, then maybe a better and easier way to turn on the agent would be to use the profiler based agent: Profiler Auto instrumentation | APM .NET Agent Reference [1.12] | Elastic

The clear advantage of that (especially on Full Framework which you have here) is that it'll turn on a whole bunch of auto instrumentation which then capture spans for supported technologies.

Hi Greg,

Thank you for the speedy response.

We have tried going down the Profiler route as suggested, however we are hitting some issues with that when trying to start the application with the APM agent.

This is showing in the Event Viewer upon application startup:
"Session "EtwSessionForCLRElasticApm_8d1790f8-bfee-429f-8cc2-31e88a418bf6" failed to start with the following error: 0xC0000022"

This is the configuration that has been injected:

We think it may be a permissions issue but we cannot find anything in the documentation that would point to where permissions need to be configured, would this perhaps need file permission changes to the DDL, or some other configuration within IIS to get the application and APM agent running?

Many thanks,

Dan

Hi @dwmri,

that ETW session is opened by the .NET APM Agent in order to collect GC metrics - and based on the error you pasted here, creating the ETW session fails.

The only consequence of this is that the agent won't be able to collect GC metrics, but everything else should work - so it should still capture transactions, spans, and all the other metrics. Btw. I don't think profiler vs. other way of injecting the agent changes anything around the creation of ETW session - so maybe you had this with your previous setup as well. Also, in the .NET APM Agent logs, there should be a more detailed warning message which explains this.

So, what I'm trying to say: that error message is only about capturing GC statistics, everything else should work.

I'm not sure what the exact problem is in your case, so let's first clarify: does the agent in general work? Or it doesn't event capture anything?

If it doesn't capture anything, I'd suggest to ignore this error right now, because it's not really related to the core problem (we can get back to it, but if the agent doesn't capture any data, then fixing the ETW session won't fix the overall issue).

So, if the agent doesn't capture anything, then could you maybe attach the agent logs, first from the profiler? Details here under ELASTIC_APM_PROFILER_LOG_DIR Profiler Auto instrumentation | APM .NET Agent Reference [master] | Elastic

Hi Greg,

The agent does work as we are still seeing URLs and performance metrics such as milliseconds per request, CPU, and memory spent on server etc.

The spans / transactions are still not being captured.

I have attached the trace output taken from VS Code, however the elastic APM log output is empty - I am not sure if the trace output taken from VS Code is what should be in the APM log, but hopefully it will help troubleshooting the missing span information.

Hi,

We've tried a few extra steps to help pinpoint the issue but we're still coming up short.

We've tried the following actions, and the only difference it is making is that some settings cause the trace log to contain an error referencing the performance profiler manager and a failure due to a fileNotFound Error, and with some other settings the trace log is empty.

We have tried running the application within IIS 10.x+ with COMPLUS_LoaderOptimization = 1, and we have also tried with ELASTIC_APM_PROFILER_EXCLUDE_INTEGRATIONS=AspNet. Both of these settings do not fix the missing spans, however from IIS and/ or VSCode with settings enabled / disable the metrics are still being captured.

Our error is similar in output to the below issue, however to our knowledge we are not getting the singleton error so we're presumming the agent is starting up OK, especially as we're seeing metrics coming through from the application. Auto instrumention - Metrics work, traces not received - #19 by forloop

This is the error and behaviour we are seeing when we disable the COMPLUS_ parameter:

Also, is there a support plan available with Elastic that would cover self-hosted setups - we are running Elasticsearch, multiple beats, Logstash, Kibana, and APM for multiple products in Production so having paid support would be very useful. I can see the Platinum subscription is available, does this cover self-hosted setups?

Hi @dwmri,

just to make sure we shoot for a realistic goal here: what you'll have with the profiler (and the reason I suggested is) that it covers a bit more auto instrumentation than your original setup.

But that'd still only capture specific things - what those specific things are is listed here: Supported technologies | APM .NET Agent Reference [1.12] | Elastic

So if your app does let's say a call to an oracle database, then the agent will capture that. But the profiler based agent still does not have the inferred span feature from java - so it won't capture your custom methods (let's say in a class where you have only your code and you don't call any of the libraries which are supported by auto instrumentation listed above).

So... what kind of spans would you exactly expect? If it's something from your own code, then unfortunately that is a feature we'll need to implement. It's definitely on our radar, but it's not currently implemented.

Hi Greg,

Here are the libraries in use:

  • ASP.NET (.NET Framework) in IIS, 4.6.1+ (IIS 7.0 or newer)
  • System.Net.Http.HttpClient
  • System.Net.HttpWebRequest

We use Oracle, however not the dll version supported.

So it sounds like the function / classes called from the above HTTP frameworks won't produce any kind of detailed tracing unless code is explicitly written to capture spans?