Bugreport: APM Plugin GRPC in combination with grpc-spring-boot-starter

Hey Elastic APM Team,

I would like to report a bug for: GitHub - elastic/apm-agent-java: Elastic APM Java Agent (apm-agent-plugins/apm-grpc/apm-grpc-plugin)

APM Agent language and version: Java, 1.39.0

tl;dr

A warning will be shown for every grpc-request, if those 2 dependencies are both used in a project:

  1. GitHub - elastic/apm-agent-java: Elastic APM Java Agent (apm-agent-plugins/apm-grpc/apm-grpc-plugin, newest version 1.39.0)
  2. GitHub - yidongnan/grpc-spring-boot-starter: Spring Boot starter module for gRPC framework. (newest version 2.14.0.RELEASE) -- The most commonly used spring-boot / grpc integration
    I do not think it is really the fault of the second dependency, I debugged into the problem a bit and I think this dependency uses a feature of the official grpc library, that is not commonly used otherwise.

Bug description

The first GRPC request of a freshly started micro-service throws the following exception:

WARN  co.elastic.apm.agent.impl.transaction.AbstractSpan - End has already been called: 'de.testservice.TestService/> testRequest' 00-00000000000000000000000000000000-0000000000000000-00 (000000)
java.lang.Throwable: null
	at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:584) ~[elastic-apm-agent-00000000000000000000000000000000-00000000000000000000000000000000.jar:1.39.0]
	at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:559) ~[elastic-apm-agent-00000000000000000000000000000000-00000000000000000000000000000000.jar:1.39.0]
	at co.elastic.apm.agent.grpc.GrpcHelper.exitServerListenerMethod(GrpcHelper.java:272) ~[elastic-apm-agent-00000000000000000000000000000000-00000000000000000000000000000000.jar:1.39.0]
	at co.elastic.apm.agent.grpc.ServerCallListenerInstrumentation$OnComplete$OnCompleteAdvice.onExit(ServerCallListenerInstrumentation.java:166) > ~[elastic-apm-agent-00000000000000000000000000000000-00000000000000000000000000000000.jar:1.39.0]
	at io.grpc.Contexts$ContextualizedServerCallListener.onComplete(Contexts.java:110) ~[grpc-api-1.51.0.jar!/:1.51.0]
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closedInternal(ServerCallImpl.java:375) ~[grpc-core-1.51.0.jar!/:1.51.0]
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:365) ~[grpc-core-1.51.0.jar!/:1.51.0]
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:923) ~[grpc-core-1.51.0.jar!/:1.51.0]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.51.0.jar!/:1.51.0]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) ~[grpc-core-1.51.0.jar!/:1.51.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
	at java.lang.Thread.run(Unknown Source) [?:?]

Every other request just logs a warning:

WARN  co.elastic.apm.agent.impl.transaction.AbstractSpan - End has already been called: 'de.testservice.TestService/> testRequest' 00-00000000000000000000000000000000-0000000000000000-00 (000000)

Minimal reproducible example

I tried to create a service of minimal size, that can be used to reproduce the bug. To start the code, set the environment variable ELASTIC_APM_SECRET_TOKEN and then:

mvn clean install spring-boot:run

The sourcecode can be found here: 5.189.140.198/test-service.zip

My thoughts on what goes wrong

I tried to debug a bit into it myself, because I thought at first the bug might be in the other dependency. I think the problem is as follows:

grpc-spring-boot-starter uses the officially supported method:
io.grpc.Contexts.interceptCall
to integrate grpc into spring-boot. This constructs a ContextualizedServerCallListener, which overrides the method onComplete, but also calls super.onComplete in this overwritten method.
But now
co.elastic.apm.agent.grpc.OnComplete.OnCompleteAdvice
gets called twice with the same listener, once for the onComplete and once for the super.onComplete (instead of just once as usually, if another listener is used)
In the debugger we now see the following invocations:

GrpcHelper.getInstance().enterServerListenerMethod(A);
GrpcHelper.getInstance().enterServerListenerMethod(A);
GrpcHelper.getInstance().exitServerListenerMethod(thrown, A, transaction, Status.CANCELLED);
GrpcHelper.getInstance().exitServerListenerMethod(thrown, A, transaction, Status.CANCELLED);

The last exitServerListenerMethod invocation now writes the warning, because A was already exited before.

I am not 100% sure that this is the problem, but it looks like if I debug into it with and without the grpc-spring-boot-starter dependency in use.

Conclusion

For now, I would be happy if you could confirm or deny that bug, because we see a lot of those warnings in all our micro-services. If you need anything else from me, I am happy to help.

Kind regards,
Hilmar

Hi @hackermann, welcome to the forum and thanks for reporting this.

As you've managed to create a minimal project to reproduce this, do you mind pushing it directly on a github repo so we could easily reproduce it on our side ?

This extra warning definitely looks like a bug in what you described, so we'd better investigate that while you have a reproducible way to trigger it at hand.

Hey @Sylvain_Juge

yes sure, I created the following repo with the code:
Github Repo

Kind regards,
Hilmar

Hi @hackermann ,

Thanks for providing a sample project, that made reproducing the issue trivial !!

In this case there is effectively a call to onCancel which is a bit odd to me as the request processing completes, and I thought that this kind of call only happened when there was a timeout or when the client stopped waiting for the response.

However, it definitely looks like a corner case and due to an implementation detail, as the transaction seems to be perfectly captured otherwise I think it's fine to surgically avoid double-ending the transaction when this happens.

Could you test with this agent snapshot (to be used with -javaagent unlike the provided project that used the programmatic attach), hopefully the issue is gone and we can include this fix for the next release.

Hey @Sylvain_Juge ,

Thanks for the fast reply.
I just tried the snapshot on our original code, and it worked! I did not see any warnings anymore.

Kind regards,
Hilmar

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.