Java APM Client: transfer transaction information when using BlockingQueue instead of ExecutorService

If you are asking about a problem you are experiencing, please use the following template, as it will help us help you. If you have a different problem, please delete all of this text :slight_smile:

APM Agent language and version:
Java APM Agent
'co.elastic.apm:apm-opentracing:1.6.2',
'co.elastic.apm:apm-agent-api:1.6.2'

Our Application use a java.util.concurrent.BlockingQueue to decouple the rest controller from the remote service.
How can I instrument the BlockingQueue-Class to transfer the current transaction to our remote service, like the ExecutorService. (see https://www.elastic.co/guide/en/apm/agent/java/current/supported-technologies-details.html#Async Frameworks

How can I transfer the current transaction information with the Public API, when I start new threads without the ExecutorService? Is there any documentation available?

Kind regards
Peter

Welcome to the Elastic APM forum :tada:

Those are interesting questions. Normally, you would either instrument something (eg BlockingQueue), preferably through an agent plugin, or use the public API.

Assuming that using the public API is an option for you, let's see if this can work for you:

  1. Use your blocking queue with a wrapper type that wraps the queued elements alongside a co.elastic.apm.api.Span object
  2. When you add an element to the queue, add it alongside a Span object. Now, it depends on how you want your transaction lifecycle to be managed:
    • if you want the thread reading from the queue to end and close the transaction - add the current transaction to the queued element
    • if you want the transaction to be ended by the thread adding to the queue, but want the context to be used by the thread reading from the queue, create a span and add it to the queue element.
  3. Find a proper method that is invoked on the reading thread and add manual instrumentation to it:
    • when this method starts- extract the span object from the queued element and activate it. If this is a span you created especially for the context delegation (and not a transaction) - you should also set the start timestamp accordingly.
    • when this method ends, end and deactivate the span/transaction

For other types of async stuff, try to follow a similar approach.

I hope this makes sense..
Eyal.

Kind regards for your answer.
Do you have an example how to implement these solution with a plugin, specially who to bind the Span object to my element when I add the element to the queue and how to get the span object from from the element when I read the element from the queue.

Kind regards
Peter

Peter,

What I meant is that a solution for doing that would be either by instrumentation (eg of BlockingQueue implementations in this case) or through the API, but not both (as your initial post seemed to ask about both).

Writing a robust, reliable, generic and performant enough plugin for such a use case is not trivial, but I believe you can use the public API to achieve what you want.

I just opened this pull request to demonstrate how you can do that.
In this case I used a wrapper to wrap the queued elements alongside an Elastic APM span, but that could be done in other ways of course. My assumption is that since you are able to use the API, you are also able to adjust your code to change the queue usage. You should find the way that suits your code the most with regard to where and how you actually correlate queued elements to spans.

You should be able to apply the concept demonstrated in this PR to your custom threads as well.

I hope this helps.
Eyal.

Interesting discussion! I was also facing kind of same issue.
Passed the context (tracing information) explicitly in my application code to handle such scenario. Though it works fine but problem with this approach is I need to have dependency to elastic-apm-agent jar for this.

This is approach works fine with the integration type "Self-Attach" i.e. attaching using -javaagent however if we want to use decoupled independent attacher, it will fail as attacher comes with its own elastic-apm-agent jar and cause LinkageError.

Do you think its worth to have an attacher which could inturn use elastic-apm-jar from the classpath and doesn't try to load its own.

I'm curious why you didn't use the public API for that?

Well I tried with public API first, but apparently it's creating new transaction and isn't able to connect to the parent .

With Public API:
transaction = ElasticApm.startTransactionWithRemoteParent(key -> someObject.getTraceId());

With Agent API :
ElasticApmInstrumentation.tracer.startTransaction( TraceContext.fromTraceparentHeader(), traceParentProperty, this.getClass().getClassLoader());

The second approach rightly creates child transaction.

Moreover as discussed on some other thread while dettaching agent with elastic.apm.active=false it only stops instrumentation and the underlying threads remain idle forever. With APM jar I could call ElasticApmInstrumentation.tracer.stop(); which actually stops everything and remove the threads created by agent.

Why do you want to create a child transaction in the same process? That's what spans are meant for which you can create via Transaction#startSpan.

Thanks for the revert, this is bit tricky.

The main transaction is initiated when a JMS message received on Queue. The transaction continues until it finishes executing onMessage method.

So for instances if the transaction is handed over to LMAX Disruptor during execution of onMessage method. Agent will not be able to instrument the EventHandler for the same transaction. To create span for these LMAX EventHandler we can explicitly instrument them using trace_methods.

This traced method will create new transaction and cannot link back to parent. Here I can use the tracing information from parent (JMS Transaction) and with ElasticApmInstrumentation.tracer.startTransaction( TraceContext.fromTraceparentHeader(), traceParentProperty, this.getClass().getClassLoader());
can create child transaction.

Is there better approach?

You can create a child span in the onMessage method and hand that over to the disruptor. In the EventHandler, you can set the start timestamp and the name of that span and activate it. See also https://github.com/elastic/apm-agent-java/blob/master/apm-agent-plugins/apm-api-plugin/src/test/java/co/elastic/apm/api/BlockingQueueContextPropagationTest.java

If you really wanted to have two distinct transactions, you can also transfer the context to a map (span.injectTraceHeaders((name, value) -> map.put(name, value))) which you can hand over to the disruptor. In the EventHandler, you can then do ElasticApm.startTransactionWithRemoteParent(key -> map.getHeader(key)).

In your other thread, I'd also be interested to hear why having the agent threads around when active=false is a problem for your use case.

Thanks a lot!
The approach works well in both cases, while transferring the span and transaction.

Regarding other thread, for me definition of detaching is running application as if agent was never connected to it. Free all the resources it capture upon attachment. Also there could be chances that some instrumentation plugins not using active flag. Though setting active to false will not report to APM Server we can still see instrumentation happening is such cases in the apm log.
I have observed that JMS plugin isn't using active flag and instrumenting the calls irrespective of the value of active flag. JMS instrumentation stops only when we set value for ElasticApmInstrumentation.tracer to null. Might need to raise issue on GIT to fix that.

Now I am using reflection to access the ElasticApmInstrumentation class from agent and using public API for instrumenting internal calls.
Performing following actions as part of dettachment.

  • set active flag to false;
  • invoke ElasticApmInstrumentation.tracer.stop()
  • set value for ElasticApmInstrumentation.tracer to null.

Please elaborate - what exactly do you see still happening when active==false? Spans/transactions created? Messages get tagged?

Oh there is more to it. As confirmed earlier setting active=false doesn't kill APM Agent threads and calling ElasticApmInstrumentation.tracer.stop() does so.

Hence if we call the stop method straight after setting active=false the configuration thread might get killed before it actually sets active=false. In such scenario It continues to create span and transaction with valid tracing IDs but don't publish data to APM Server as data sending thread is also killed in stop method.

But in this case if we also set ElasticApmInstrumentation.tracer=null the APM Agents actually shuts down which I believe is complete dettachment. We dont see anything happening in the APM log and no running threads in JVM. We can see following shut down message in the APM log:

2019-08-08 04:58:22.665 [RMI TCP Connection(13)-10.199.92.86] DEBUG co.elastic.apm.agent.bci.MatcherTimerLifecycleListener - Total time spent matching: 1,356 ,255,300ns | Advice name | Type ns | Method ns | | ApacheHttpAsyncClientInstrumentation | 1,076,625,285 | 0 |

Now if we just set active=false and don't call the stop method JMS will not create tracing IDs but in the APM log we can still see its intercepting the calls and doing some stuff. I am also using log correction feature and in this case it stamps string of zeros in tracing id. When agent is shut down using first approach the tracing ids are blank.

2019-08-08 05:36:07,368 INFO [jmsContainer-1] [00000000000000000000000000000000] ae.MyClass.shouldContinue:1163 - null,null,null,0,null,null,FFDPSX,null,null,null,

Following is the snippet of the apm.log after setting active=false

2019-08-08 05:36:07.368 [jmsContainer-1] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'JMS RECEIVE from topic XXX-MY-TOPIC' 00-00000000000000000000000000000000-0000000000000000-00 (75264a27) (1) 2019-08-08 05:36:07.368 [jmsContainer-1] TRACE co.elastic.apm.agent.impl.transaction.AbstractSpan - decrementing references at java.lang.RuntimeException: This is an expected exception. Is just used to record where the reference count has been decremented. at co.elastic.apm.agent.impl.transaction.AbstractSpan.decrementReferences(AbstractSpan.java:260) at co.elastic.apm.agent.impl.transaction.Transaction.decrementReferences(Transaction.java:224) at co.elastic.apm.agent.impl.transaction.AbstractSpan.deactivate(AbstractSpan.java:211) at com.xxx.MySolaceMessageListener.onMessage(MySolaceMessageListener.java:49)

2019-08-08 04:19:33.572 [jmsContainer-1] DEBUG co.elastic.apm.agent.impl.transaction.AbstractSpan - decrement references to 'JMS RECEIVE from topic XXX-MY-TOPIC' 00-00000000000000000000000000000000-0000000000000000-00 (90c4e8f) (0) 2019-08-08 04:19:33.572 [jmsContainer-1] TRACE co.elastic.apm.agent.impl.transaction.AbstractSpan - decrementing references at java.lang.RuntimeException: This is an expected exception. Is just used to record where the reference count has been decremented. at co.elastic.apm.agent.impl.transaction.AbstractSpan.decrementReferences(AbstractSpan.java:260) at co.elastic.apm.agent.impl.transaction.Transaction.decrementReferences(Transaction.java:224) at co.elastic.apm.agent.impl.ElasticApmTracer.endTransaction(ElasticApmTracer.java:393) at co.elastic.apm.agent.impl.transaction.Transaction.doEnd(Transaction.java:161) at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:184) at co.elastic.apm.agent.impl.transaction.AbstractSpan.end(AbstractSpan.java:175) at com.xxx.MySolaceMessageListener.onMessage(MySolaceMessageListener.java:49)

You are right about log correlation - should not work when the agent is inactive. Just did a PR for that.

The log snippet you provided seems reasonable right after setting active to false, as the already created transactions and spans need to be ended and cleaned. However, for onMessage and receive we do create transactions when active==false. Those are noop transactions and we don't really do anything with them, but they are causing DEBUG logging, so I can see how this is misleading. I added it as a task to treat that when applying the JMS tracing enhancements.

Regarding the "detachment" - I understand you expect different things when setting active=false but we didn't hear so far that having idle thread alive is a problem. For me, detaching is something else- it means REALLY behave the same as if agent was not installed. This means removing the bytecode we injected to classes in the first place. We have that in mind as something we want to address at some point, and once we get to that, it will probably include the stuff that happens in co.elastic.apm.agent.impl.ElasticApmTracer#stop.