Summary
TraceContinuationStrategy
is not a complete successor for TraceContextIgnoreSampledFalse
. In Transaction.cs, when TraceContextIgnoreSampledFalse
is true, the ParentId of a distributed transaction will be set to null.
This detaches the local trace from the caller's trace.
For distributed transactions, ensure that TraceContextIgnoreSampledFalse
is always set to false
.
Explanation
Our software runs distributed across a large number of nodes.
Somehow we were unable to instantiate the value of HTTP-header "traceparent" such that it correlated the trace across the nodes.
All asp.net core website instances were initialized using:
IApmLogger elasticApmLogger = new ElasticApmLogger();
AgentComponents agentComponents = new AgentComponents(elasticApmLogger, elasticApmConfig, null);
Agent.Setup(agentComponents);
HttpDiagnosticsSubscriber httpDiagnosticsSubscriber = new HttpDiagnosticsSubscriber();
services.AddElasticApm(httpDiagnosticsSubscriber);
AspNetCoreDiagnosticSubscriber aspNetCoreDiagnosticSubscriber = new AspNetCoreDiagnosticSubscriber();
services.AddElasticApm(aspNetCoreDiagnosticSubscriber);
SqlClientDiagnosticSubscriber sqlClientDiagnosticSubscriber = new SqlClientDiagnosticSubscriber();
services.AddElasticApm(sqlClientDiagnosticSubscriber);
and each one had an individual trace registered.
On the receiving side in the network, the traceparent
was received by a filter and had a value such as:
00-e5c1244444312af480bae5db1156ff7d-67f551c7ec652007-01
According to Trace Context, the 67f551...007
value is the parent ID to which the local trace should be attached.
However, the outcome of:
DistributedTracingData distributedTracingData = DistributedTracingData.TryDeserializeFromString(traceParent);
transaction = Agent.Tracer.StartTransaction($"distri-{origin}", ApiConstants.TypeRequest, distributedTracingData);
was always that transaction.ParentId
is null, despite distributedTracingData.ParentId
being NOT null.
After checking the published source code of Elastic.Apm.Model.Transaction
, it was found that TraceContextIgnoreSampledFalse
despited being deprecated with TraceState
being null, ensures that the ParentId
is set to null.
// If TraceContextIgnoreSampledFalse is set and the upstream service is not from our agent (aka no sample rate set)
// ignore the sampled flag and make a new sampling decision.
#pragma warning disable CS0618
if (configuration.TraceContextIgnoreSampledFalse && (distributedTracingData.TraceState == null
#pragma warning restore CS0618
|| (!distributedTracingData.TraceState.SampleRate.HasValue && !distributedTracingData.FlagRecorded)))
{
IsSampled = sampler.DecideIfToSample(idBytes);
_traceState?.SetSampleRate(sampler.Rate);
// In order to have a root transaction, we also unset the ParentId.
// This ensures there is a root transaction within elastic.
ParentId = null;
}
In our case, we have solved it by setting TraceContextIgnoreSampledFalse
back to the default value false
and keeping TraceContinuationStrategy
on continue
.
After searching a note was found on HTTP configuration options | APM .NET Agent Reference [1.x] | Elastic regarding .NET 5 applications, but it is unclear from the context that this concerns .NET 5 and newer applications. The note might be out-of-date.