How to enable distributed tracing (Traceparent)?

Hi!

Im working for a big organisation where we barely even know what application talks to what application and in what order. Its many times not in a logical flow, more of a spiderweb.

And now we want to enable distributed tracing. I have read the guides but i practically dont know how to roll this out in the organisation? Do we have to know the flow or is it an easy way around this?

Im reading this guide.

Please advice here. Is it as simple as setting the traceparent on the first receiving node on all outgoing traffic we attach traceparent as metadata and all other systems read incoming traffic for a traceparent and relate their transactions to this parent? All subsystems are setup the same way?

When system B talks to system C (if we consider the scenario in the link) is the traceparent still there? Doesnt need to be set again?

And is any system between elastic for example application insights which is wc3 compliant ok to have in this flow as long as its setup for distributed tracing?

Nice to meet you urbanalho, thanks for your questions.

Before we get to your specific questions -- here's a few broad concepts.

By default, if you're using standard HTTP(S) based services in your organization Elastic's APM products will automatically inject and read the traceparent header for you. You don't need to do anything extra. Service A talks to Service B talks to Service C. As long as the services are HTTP based the header propagation will happen automatically.

The document you linked outlines a few exceptions that.

First -- if you want to trace requests starting from browser based ajax/XHR requests, then you'll need to take the additional steps outlined on that page related to the javascript RUM agent.

Second -- if you have specialized services that are NOT HTTP(s) based Elastic APM agents will not automatically create transaction or trace those nodes. In these cases your teams will need to manually instrument these services by starting and stopping transactions, which includes reading the traceparent headers. You can read about a specific example of this in the Distributed tracing and messaging queues docs.

With that context set ...

I have read the guides but i practically don't know how to roll this out in the organisation? Do we have to know the flow or is it an easy way around this?

You don't need to know the flow. You just need to identify your services.

Step one will be installing the agents into your services. This will get your 80% of the way there. For some customers this is all that's needed.

Step two will be identifying teams with services that aren't HTTP based and having them write startTransaction instrumentations snippets. This is often an ongoing process as you and your org decide which services need to be traced.

Step three is having your teams instrument new non-http services by default.

Is it as simple as setting the traceparent on the first receiving node on all outgoing traffic we attach traceparent as metadata and all other systems read incoming traffic for a traceparent and relate their transactions to this parent?

As mentioned Elastic handles setting the traceparent on the nodes for you with HTTP based services. If the service is not HTTP based you'll need to either set an initial traceparent, or read the existing traceparent.

All subsystems are setup the same way?

Yes. Either you do nothing (HTTP based services) or you have your service originate or read/pass-on on the traceparent.

When system B talks to system C (if we consider the scenario in the link) is the traceparent still there? Doesnt need to be set again?

If these services are HTTP based, you need to do nothing. If the services are not HTTP based, you will need to manually start a transaction. When you manually start a transaction you'll look for the trace parent header. If it's not there, you'll start a transaction without it (which will originate a new traceparent). If it is there, you'll start a transaction by passing that traceparent header in when you start the transaction.

And is any system between elastic for example application insights which is wc3 compliant ok to have in this flow as long as its setup for distributed tracing?

It should be? Although, to be honest the above statement is a little broad and I don't know that I understand it. What I can say is elastic's traceparent and tracestate headers implement the W3C standards for tracecontext.

Phew! I think that covers it. If there's still gaps in your understanding let us know and we'll do our best to get your sorted.

Thanks for really good answers. Wow. Thanks for time and effort, i buy you a beer or five if you come to gothenburg. 5 thumbs up!

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.