Control Sample Rate or `recorded` Flag Per Request

In a production setting, I'd like to be able to control my sample rate, or simply enable recording of a distributed trace, on a per request basis.

Is this possible using the existing agents?

Thanks :slight_smile:

Hi and thanks for another great question :slight_smile:

I don't think our agents currently support that. One challenge is that the sampling decision is made up front and when trying to change the sampling afterwards, there could be some spans which are already discarded or sent to the APM Server.

Could you elaborate on which conditions you would use to base the sampling decision on?

Thanks,
Felix

At the front-end making the initial request :slight_smile:

Are you creating the transactions yourself, with the agent's API, or do you rely on the auto instrumentation of the RUM agent? To you want to influence sampling for the initial page load or for subsequent AJAX requests? How and based on which properties would you decide whether to sample or not?

On the front-end, I am currently doing nothing, and do not have near-term plans for using the full RUM.

Transactions are created automatically by APM agents in my services.

The front-end currently doesn't know about transactions, and ideally I'd like to restrict its knowledge about them to simply setting a header if possible.

I naively thought that if I just generated a valid traceparent header with the recorded flag set and included that in my HTTP request, that my services would pick that header up and use it. Is this not the case?

I want to be able to instrument individual requests a la carte at the moment. In the future I will investigate using RUM to instrument multi-request page loads and such, but for now the scope I am concerned with is smaller!

Thanks!

The traceparent header is meant for creating traces to support distributed tracing, so it is expected to contain a valid parent ID. Since you are not using the RUM agent, there is no parent span in your case.

However, you describe an interesting use case of forcing the backend agent to trace some of the transactions. We can consider using the existing header for that, but it will probably make more sense to use a different and simpler header format for that.
Ideally, how would you expect this feature to look like?

Will APM Server discard any transactions or spans which belong to a distributed trace for which the initial transaction/span has not been received by APM? I had kind of hoped that I could just create a valid traceparent value and send it with my initial request, and that would be all that was needed.

I'd like it to be easier to decide myself when or how often I will record a distributed trace.

The simplest solution I think would be if APM Server did not discard a transaction or span which references a parent that doesn't exist. Perhaps that would lead to complications that I don't see, though!

I think another similar issue that I have is that the current W3 draft has no bits reserved for tuning the level of the trace. See here:

It has been suggested to use a tracestate header to transmit this information, and if Elastic APM agents propagate this header to all participants in a distributed trace, then it seems that that will problem will be solved.

Orphaned transactions/spans will not be discarded by the server - not currently anyway. It is however generally expected that if a parent ID is given that the parent should exist, so I'm not sure that we could guarantee that will hold true forever.

It sounds like you want something like Zipkin's debug flag, but perhaps with a bit more control? If the agents provided a way of passing a sampler into "startTransaction", would that satisfy your needs?

We're currently considering extending sampling configuration. If your criteria for deciding how to sample is something that could be captured in configuration, it would be of great interest to us to hear more details.

Hmm... if they are not discarded, is it possible that they're there in ES, but won't show up in Kibana APM?

Zipkin's debug flag would be great, but what would be even greater would be if I could define my own flags. Propagation of the tracestate header should give me this.

I'd like for my sampling criteria to be something I can tweak on a per-request basis, where the request is the "first" request in a distributed trace, usually issued by a front-end. This flexibility would allow me to troubleshoot problems on an ad hoc basis by adding some sign to my first request, probably usually in an HTTP header.

I hope that helps clarify

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.