Highly Custom Distributed Tracing?

So I would love to get distributed tracing working for our application but there is a case here when we do not use the standard infrastructure and move to a websocket.

Essentially there are cases when we go `API --> Websocket (UWS) --> Node.JS Application --> API (Localhost) --> Node.js Worker Cluster (Express routed by PM2)

In the worker cluster there will be plenty going on such as db requests and such. The API is just a localhost express app but it only has a single route so I would want to custom define those things but still have the instrumentation for the db calls and such as well that occur.

The problem here is that its all custom essentially once it gets through the websocket but I would like to tie it all together.

Essentially how would I pass the right parameters through to the websocket that would allow me to show the request all the way through the custom pieces of the application (I know I'd need to do all custom handling of the apm for those things).

Hi Brandon

Thanks for trying out Elastic APM :slight_smile:

The Node.js agent will automatically continue a distributed trace if it's coming in over HTTP/HTTPS. For all other protocols, such as WebSocket, you need to manually encode and decode the metadata needed to continue the trace when sending data between services.

Check out this answer for more details on how to do this:

If you have any other questions, don't hesitate to ask!

Best,
Thomas

Awesome, thanks!

One thing not answered in that post is what the actual headers that are added by APM by default are. I do end up calling other API's which use apm in this case which should be able to auto read the right header. Sorry if i missed it in there.

AKA: When I call from my service to the API which also has APM - what header name should I use so it picks up the trace parent and handles it properly?

If you call other services with HTTP/HTTPS, then the agent will automatically add the correct header for you. If you call other services using other protocols, you can just name the header/metadata whatever your want :slight_smile:

So you shouldn't need to know what the name of the HTTP header is. The reason why we actually don't want users to manually name this header is for compatibility reasons. We need all our agents (.NET, Java, Ruby, Node.js etc) to all use the same header name - else they will not work together in a distributed trace.

Currently it's called elastic-apm-traceparent, but we're in the process of changing this to just traceparent.

If you have a need to modify or manually set the HTTP header, please let me know as this shouldn't be needed. If so we need to add an API that lets you do this in a way that doesn't require you to know what the header is called.

Well I have a situation where there is a small part that I am not currently planning to add apm to. The websocket server acts as a proxy in some cases and not sure I want apm there since it would need to be completely custom anyway.

I am passing the trace parent along to the websocket now and then it calls via HTTP the worker which is . running apm so in this case I need to add the http header so that the transaction can continue again after it is through the proxy.

Even if I did add the apm in that case though I am fairly certain it wouldn't work (how would it know the trace parent in this case) so outside of that I guess I could see having a library for custom services like this to do things like adding the header or something could make sense? Not sure.

I will probably just end up building out the whole websocket server with transactions in the end :stuck_out_tongue: but not sure when that would be.

Guess I will just add both headers so it will work once you guys change it?

Yes, that should do the trick.

I do think however, that we should add an API to the agent that allows you to do this without having to actually run the agent. It is on our todo list, but not yet planned.

Would also be nice in the shorter-term if you documented this a bit more. It is pretty amazing to have this once you have it and I never put time into it really because i never really figured i could make it work with our special environment / situation.

There isn't much information on distributed tracing at all. People need to see screenshots, examples, etc to get them excited. This brings a TON of value. I already caught what is causing a delay in a request that hits 9 different services along the way because of it which we have not really known the cause of until now (granted we didn't put the time to add all the logging to figure that out yet).

1 Like

Completely agree. When you originally posted this topic, I created an issue to track this: https://github.com/elastic/apm-agent-nodejs/issues/1155

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.