Distributed tracing not working on .net [solution]

Summary

TraceContinuationStrategy is not a complete successor for TraceContextIgnoreSampledFalse. In Transaction.cs, when TraceContextIgnoreSampledFalse is true, the ParentId of a distributed transaction will be set to null.

This detaches the local trace from the caller's trace.

For distributed transactions, ensure that TraceContextIgnoreSampledFalse is always set to false.

Explanation

Our software runs distributed across a large number of nodes.

Somehow we were unable to instantiate the value of HTTP-header "traceparent" such that it correlated the trace across the nodes.

All asp.net core website instances were initialized using:

            IApmLogger elasticApmLogger = new ElasticApmLogger();

            AgentComponents agentComponents = new AgentComponents(elasticApmLogger, elasticApmConfig, null);

            Agent.Setup(agentComponents);

            HttpDiagnosticsSubscriber httpDiagnosticsSubscriber = new HttpDiagnosticsSubscriber();
            services.AddElasticApm(httpDiagnosticsSubscriber);

            AspNetCoreDiagnosticSubscriber aspNetCoreDiagnosticSubscriber = new AspNetCoreDiagnosticSubscriber();
            services.AddElasticApm(aspNetCoreDiagnosticSubscriber);

            SqlClientDiagnosticSubscriber sqlClientDiagnosticSubscriber = new SqlClientDiagnosticSubscriber();
            services.AddElasticApm(sqlClientDiagnosticSubscriber);

and each one had an individual trace registered.

On the receiving side in the network, the traceparent was received by a filter and had a value such as:

00-e5c1244444312af480bae5db1156ff7d-67f551c7ec652007-01

According to Trace Context, the 67f551...007 value is the parent ID to which the local trace should be attached.

However, the outcome of:

DistributedTracingData distributedTracingData = DistributedTracingData.TryDeserializeFromString(traceParent);

transaction = Agent.Tracer.StartTransaction($"distri-{origin}", ApiConstants.TypeRequest, distributedTracingData);

was always that transaction.ParentId is null, despite distributedTracingData.ParentId being NOT null.

After checking the published source code of Elastic.Apm.Model.Transaction, it was found that TraceContextIgnoreSampledFalse despited being deprecated with TraceState being null, ensures that the ParentId is set to null.

// If TraceContextIgnoreSampledFalse is set and the upstream service is not from our agent (aka no sample rate set)
// ignore the sampled flag and make a new sampling decision.
#pragma warning disable CS0618
if (configuration.TraceContextIgnoreSampledFalse && (distributedTracingData.TraceState == null
#pragma warning restore CS0618
    || (!distributedTracingData.TraceState.SampleRate.HasValue && !distributedTracingData.FlagRecorded)))
  {
    IsSampled = sampler.DecideIfToSample(idBytes);
    _traceState?.SetSampleRate(sampler.Rate);

    // In order to have a root transaction, we also unset the ParentId.
    // This ensures there is a root transaction within elastic.
    ParentId = null;
  }

In our case, we have solved it by setting TraceContextIgnoreSampledFalse back to the default value false and keeping TraceContinuationStrategy on continue.

After searching a note was found on HTTP configuration options | APM .NET Agent Reference [1.x] | Elastic regarding .NET 5 applications, but it is unclear from the context that this concerns .NET 5 and newer applications. The note might be out-of-date.