Trace id is inserting prohibited character in header

APM Server version: 1.29.0

APM Agent language and version:
[INFO] +- co.elastic.apm:apm-agent-api:jar:1.29.0:compile
[INFO] +- co.elastic.apm:elastic-apm-agent:jar:1.29.0:compile

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
Under scale run - API call is getting rejected once in while by webclient(one in 100 times)
netty is rejecting the call because - there is a prohibited character getting inserted as part of the traceparent. Also traceparent is getting modified.

This is not happening very often. When it happens functionality is broken as the service to service call fails.

a header value contains a prohibited character '': 00-4a4a577c4bb47fb90ade48ffd8ef9fa3cba3a3635c53c76d676d6d-01

a header value contains a prohibited character '': 00-44f1278d295ed390916bdfb091679-30e5a88ffe7277267b--0101

a header value contains a prohibited character '': 00-c5708869a028e0c8847e63a42f675dc3-0c8847e63a42f675dc3df18fab8-01bedf18fab8-01

That prohibited character is: \0 (0x0) NUL

java.lang.IllegalArgumentException: a header value contains a prohibited character '': 00-c5708869a028e0c8847e63a42f675dc3-0c8847e63a42f675dc3df18fab8-01bedf18fab8-01
	at io.netty.handler.codec.http.DefaultHttpHeaders$HeaderValueConverterAndValidator.validateValueChar(DefaultHttpHeaders.java:481)
	at io.netty.handler.codec.http.DefaultHttpHeaders$HeaderValueConverterAndValidator.convertObject(DefaultHttpHeaders.java:461)
	at io.netty.handler.codec.http.DefaultHttpHeaders$HeaderValueConverterAndValidator.convertObject(DefaultHttpHeaders.java:452)
	at io.netty.handler.codec.DefaultHeaders.setObject(DefaultHeaders.java:496)
	at io.netty.handler.codec.http.DefaultHttpHeaders.set(DefaultHttpHeaders.java:189)
	at org.springframework.http.client.reactive.ReactorClientHttpRequest.lambda$applyHeaders$3(ReactorClientHttpRequest.java:127)
	at java.base/java.util.Map.forEach(Unknown Source)
	at org.springframework.http.client.reactive.ReactorClientHttpRequest.applyHeaders(ReactorClientHttpRequest.java:127)
	at org.springframework.http.client.reactive.AbstractClientHttpRequest.lambda$null$0(AbstractClientHttpRequest.java:142)

This started happening after upgrading to 1.29.0
Steps to reproduce:

  1. Make an api call(using netty webclient) from within the completable future
  2. Run it in a loop

Provide logs and/or server output (if relevant):

WebClient.Builder webClientBuilder = WebClient.builder()
.exchangeStrategies(ExchangeStrategies.builder()
        .codecs(configurer -> configurer
                .defaultCodecs()
                .maxInMemorySize(BYTE_COUNT))
        .build())
.clientConnector(connector)
.defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.apply(oauth2.oauth2Configuration())
.filter(logRequestFilter())
.filter(tracingHeaderFilter());
    private static ExchangeFilterFunction tracingHeaderFilter() {
        return (clientRequest, next) -> {
            Span span = ElasticApm.currentSpan();
            ClientRequest.Builder filtered = ClientRequest.from(clientRequest);
            span.injectTraceHeaders((name, value) -> {
                        if (name != null && value != null) {
                            filtered.header(name, value);
                        }
                    }
            );
            return next.exchange(filtered.build());
        };
    }

final Map<String, String> copyOfContextMap = getCopyOfContextMap();
List<List<String>> batchOfIds = Lists.partition(ids, 5);
batchOfIds.forEach(currentBatchIds -> currentBatchIds
        .stream()
        .map(id -> CompletableFuture.runAsync(() -> {
            try {
                MDC.setContextMap(copyOfContextMap);
                // API call to get entity by id
            } catch (Exception e) {
               // throw e
            } finally {
                MDC.clear();
            }
        })).collect(Collectors.toList()).forEach(CompletableFuture::join));


Hi @Chetan_K , welcome to our forum!

The symptoms you describe seem typical to concurrency issues. It is likely that your custom trace context handling contains race condition that has higher failure rates as contention increases.

In case you are not aware, the latest version (1.33.0) introduces auto-instrumentation for Spring WebClient (currently experimental).

So first thing I would try is upgrade to agent version 1.33.0 and set the followings:

If this works, you can just replace your custom instrumentation with the new automatic one.
Otherwise, try debugging where things happen out of order. Because this is multithreaded, you may need to add some debugging. In addition, setting the agent's log_level to debug may be useful as well.

I hope this helps!

Thanks @Eyal_Koren for the quick help.

Have tried to debug this further.

span.injectTraceHeaders((name, value) -> {
                        if (name != null && value != null) {
                            log.trace("Setting apm header: {}={}", name, value);
                            filtered.header(name, value);
                        }
                    }
            );

Success usecase

mdc={
	transaction.id=4764dc353cc9bc9b
	trace.id=ce8c3b9a190d8eb702da2f6c270b4b17
}

get current span traceId: ce8c3b9a190d8eb702da2f6c270b4b17
Setting apm header: tracestate=es=s:1
Setting apm header: traceparent=00-ce8c3b9a190d8eb702da2f6c270b4b17-4764dc353cc9bc9b-01
Setting apm header: elastic-apm-traceparent=00-ce8c3b9a190d8eb702da2f6c270b4b17-4764dc353cc9bc9b-01

Failure usecase

mdc={
	transaction.id=2ca180562df75aa6, 
	trace.id=5d26a244d68d8eb8fd94c7d1adda7961
}

get current span traceId: 5d26a244d68d8eb8fd94c7d1adda7961
Setting apm header: tracestate=es=s:1
Setting apm header: traceparent=00-00-55d26a2444d68d8eb8fc7d17d1adda7961--2ca180562df75aa6180562df75aa6--0011
Setting apm header: elastic-apm-traceparent=00-00-55d26a2444d68d8eb8fc7d17d1adda7961--2ca180562df75aa6180562df75aa6--0011

As you can see injectTraceHeaders is adding the invalid headers
From within service, we are not tampering the traceids.
As you can see, from within span.injectTraceHeaders traceparent values are tampered. Not sure from where \0 is getting added as well which is not shown.
Can you think of anything which is going wrong here.?? I don't see any exceptions around as well

I am thinking of adding the validation for \0, so that at least webclient is not broken.
We are evaluating the auto instrumentation as well.

Thanks for the useful info!
I agree it looks like it's related to concurrent computation of the outgoing traceparent header.
I tried something to make this part thread-safe. Please try the corresponding snapshot and see if it resolves the issue.

Thanks @Eyal_Koren for the quick turn around.

Locally it is not reproducible.
If this change making into some release, I can upgrade and verify in higher stacks.
For now, I am unblocked by adding a workaround to eliminate the invalid characters in the traceparaent.

Thanks for the feedback!
Was this issue reproducible locally with 1.33.0?

@Chetan_K your input would be very useful for us.
If you can verify that the problem is not reproducible with the provided 1.33.1-SNAPSHOT where it is reproducible with 1.33.0, it would be a very good indication for us to include it in the next release.
Your current workaround means that you are losing transactions and get incomplete traces.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.