Hi,
We have a bunch of micro services implemented in Go that talk to each other over gRPC.
Recently we implemented logging of the trace ID in our services in order to be able to trace logs across services. We use the Zap logger and extract the APM information to create a new logger as described here.
However, it seems like the trace ID only gets sent from one service to the next if the current span is sampled. If the span isn't sampled we get a (seemingly) random trace ID and we're unable to trace the request across services.
We instrument our clients with the interceptor like this:
conn, err := grpc.Dial(addr, grpc.WithInsecure(), grpc.WithUnaryInterceptor(apmgrpc.NewUnaryClientInterceptor()))
Looking at the interceptor's source code we can see that the metadata doesn't get added if the span is dropped:
func startSpan(ctx context.Context, name string) (*apm.Span, context.Context) {
span, ctx := apm.StartSpan(ctx, name, "external.grpc")
if span.Dropped() {
return span, ctx
}
traceparentValue := apmhttp.FormatTraceparentHeader(span.TraceContext())
md, ok := metadata.FromOutgoingContext(ctx)
if !ok {
md = metadata.Pairs(traceparentHeader, traceparentValue)
} else {
md = md.Copy()
md.Set(traceparentHeader, traceparentValue)
}
return span, metadata.NewOutgoingContext(ctx, md)
}
This means that for a sample rate lower than 1.0 tracing doesn't really work as intended.
Now the questions are:
- are we doing anything wrong?
- have we misunderstood how distributed tracing works?
- is this a bug in the module?
We're using version 1.4.0 of the APM gRPC module.