Hi, adding @Traced is not fixing it, but normal methods annotaed with it are reported as expected. I added them for the log, so that one can see that the methods are finished and other spans are working correctly.
Ahh, I see. I assume we are talking about the transaction.TransactionService/Description span.
In the debug log I can see that the io.grpc.ClientCall.Listener#onClose instrumentation is applied to two listener types. This is the instrumentation that takes care of ending the client spans. Are you using a custom listener as well? Can you verify that this listener API is actually being called for the traced client calls? If so, can you confirm that onClose() is being invoked with a non-null io.grpc.Status?
I don't use any custom listeners (in this test), but grpc is running io.grpc.internal.DelayedClientCall$DelayedListener::onClose which calls io.grpc.stub.ClientCalls$UnaryStreamToFuture::onClose, both with Status.OK. Everything running in a io.grpc.stub.ClientCalls::blockingUnaryCall.
I generated a stacktrace:
java.lang.Exception
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:524)
at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:740)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:149)
at transaction.TransactionServiceGrpc$TransactionServiceBlockingStub.description(TransactionServiceGrpc.java:156)
at apmTests.Main.main(Unknown Source)
... Method match for instrumentation ListenerClose: ... matches public void io.grpc.internal.DelayedClientCall$DelayedListener.onClose(io.grpc.Status,io.grpc.Metadata)
Which means that this listener is instrumented as expected by ListenerClose, so when the onClose() method exits after being called with a non-null Status, it should end the client span.
If you can setup a debug environment with our agent code and debug co.elastic.apm.agent.grpc.GrpcHelper#exitClientListenerMethod(), you may be able to find out what's not working as expected.
In GrpcHelperclientCallStartEnter is called with what seems like correct parameters, the remove call inside returns null though. clientCallSpans contains a single item, the pending call.
enterClientListenerMethod is called three times afterwards, with the DelayedListener as a parameter. clientCallListenerSpans is empty, the get returns null. The last time it is called is during the onClose of the DelayedListener (onEnter instrumentation).
onClose of the DelayedListener runs
Program ends
exitClientListenerMethod doesn't seem to run at any time.
ClientCallImplInstrumentation$ListenerClose.onExit is called, but the span passed is null, so the call to exitClientListenerMethod gets skipped. listener is the DelayedListener, status is OK and thrown is null.
Thank you for your help!
It looks like the key in clientCallStartEnter that is searched for is:
OK, so the issue is that the span is not registered properly in the first place.
What's responsible for the span creation and registration is ChannelInstrumentation, for which I do see matches in the log.
Try to debug this instrumentation advice methods (onEnter and onExit) and see if you get anything. The cause may be that we are not instrumenting the relevant io.grpc.Channel instrumentation.
What you can also do is debug the io.grpc.Channel#newCall interface method and see if there are invocations we miss entirely.
We exclude io.grpc.internal.ManagedChannelImpl$RealChannel from instrumentation, maybe it is required in your case, but look if there is something else.
Also, which gRPC version are you using?
Oh, I completely forgot to add the gRPC version. I'm using 1.39.0. I just now tried it with much older versions, it looks like it broke after gRPC 1.32.2. With 1.32.2 it works as expected, after that it does nothing. In 1.33.0 it looks like they changed some channel subclassing, but that was reverted in 1.33.1. Neither of those versions (or any subsequent) work anymore with APM.
OK. I tried something real quick to check a direction.
Please try out this snapshot (ignore the color, the gRPC client test is expected to fail due to an extra span) and see what you get.
I just restored instrumentation for io.grpc.internal.ManagedChannelImpl$RealChannel, which we excluded before. It may cause extra spans, but hopefully we can pinpoint the exclusion of this Channel implementation as the culprit.
I am not sure how your actual project is setup, but provided dependencies are added to the test classpath, so this may affect your results. The agent should never be added as a dependency, only attached using one of the documented methods.
Please see if this is related or what is missing from the referenced project.
OK, but if it's on the classpath, it may be problematic.
If you did not already, please try to run the test with the snapshot I provided without this agent dependency.
I tried running it again without in in the classpath and with your Snapshot. I got (basically) the same log as above.
The project should work, you have to compile it first so that the grpc code gets generated and the IDE recognices it. Then MainServer needs to be startet (it's own process so that it doesn't get any apm instrumentalization), and MainClient connects to the server when started. Nothing in a test framework, to not add additional error sources.
However, keeping everything the same and only restoring instrumentation for io.grpc.internal.ManagedChannelImpl$RealChannel, the span is created as expected:
So I do believe this channel's instrumentation is related, but we need to figure out what's working differently on your setup. Please try to use the provided snapshot and debug GrpcHelper#clientCallStartEnter and GrpcHelper#registerSpan again, see what happens. The registerSpan should be called twice per client call and store two keys in the clientCallSpans map, then clientCallStartEnter may be called only once, but it should find the key. Please let me know if you realize why it doesn't.
@YvesAndre I started a big change in our gRPC instrumentation, part of which should fix the issue you reported. Can you please test the branch's snapshot and see how it works for you?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.