Sampling rate handling for distributed traces

digitalron · June 18, 2019, 11:06am

Hi, we'd like to more understand the handling/interpretation of transaction_sample_rate across a distributed system. Does the value of transaction_sample_rate only affect the sampling at the entry point service, with the succeeding services all being sampled due to the presence of a trace.id value being propagated across, even if these succeeding services have lower than 1.0 sampling rate?

To illustrate, imagine services A and B

A=1.00, B=1.00 : request --> A ---> B 100%
A=0.50, B=1.00 : request --> A ---> B 50%
A=1.00, B=0.50 : request --> A ---> B 100% or 50%?
A=0.50, B=0.50 : request --> A ---> B 50% or 25%?

We are interested in dynamically changing our sampling rates in response to increasing or decreasing traffic as well as anomalies in errors or latencies detected. As a result, we are wondering whether we should change the sampling rate on all services along a trace path, or only the entry point services. This is important because we have services that are both entry-point and succeeding

TIA!

P.S.

AppDynamic has the option to set the sampling rate to 100% for x seconds/minutes from the backend for individual business transactions (equating to distributed traces in Elastic APM) to help capture requests during production diagnostics. Would love to see something similar in Elastic.

felixbarny · June 18, 2019, 1:28pm

The first service determines whether the whole trace should be sampled or not. So in your examples, the sampling rate is always determined by service A. Only if B was directly invoked, it's configured sampling rate would be taken into account.

You may be pleased to hear we're planning to add remote configuration capabilities to the agents.

Thanks for the input! We're also evaluating other strategies for sampling like rate limited sampling and tail based (after-the-fact) sampling.

digitalron · June 18, 2019, 1:43pm

Thank you very much for the confirmation of the scenarios Felix, it's super appreciated.

We are also looking forward to the remote configuration capabilities of the agents. That would really help a lot in our ability to dynamically address latencies and exceptions.

Cheers!

system · July 9, 2019, 9:43am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to limit transaction sampling rate by transaction duration? APM nodejs	4	481	October 20, 2020
How does sampling work APM java	7	554	September 28, 2020
Transaction_sample_rate post 8 release versions Elasticsearch	5	215	October 17, 2023
Dynamic Sampling Rates APM	7	1230	January 28, 2019
APM Configurations preferred for large distributed system to have minimal storage and minimal bandwidth? APM	3	483	January 1, 2019

Sampling rate handling for distributed traces

Related topics