How does sampling work

Hi
Can you please clarify these questions around sampling,

  1. Does agents push all the metrics to apm server irrespective of sampling and apm server chooses the data to ingest based on sampling %
  2. Or does apm agents themselves discard some data based on sampling rate at machine level?
  3. If the working mode is point 1, whats the use of having sampling rate configuration at agent level?
1 Like

#2) The Java APM agent performs the sampling before the data is sent to the APM server.

To be clear even when transaction sampling is set to a value less than 1.0 every transaction time and duration is recorded. When a transaction is not sampled it means it does not include details for the spans / detailed trace information.

Perhaps also look at dynamic profiling since you appear to be using the Java agent.

I would suggest taking a good look at all the sampling, reporting minimum span durations etc settings, as you can highly customize the Java agent behavior.

https://www.elastic.co/guide/en/apm/agent/java/current/configuration.html

Thanks for the reply, but my question is more around if the Agent itself discards sampling then how does multiple microservices involved in distributed tracing make sure that when the originator transaction is sampled, they also send their data irrespective of their own sampling rate.

Example if service 1 service 2 are on APM and service 1 calls service2 .
service1 sampling rate is 0.1 and service 2 sampling rate is 0.01,

Then irrespective of service 2 sampling rate, if the service 2 sees that the call from service 1 is being sampled, then it has to push its data also (transaction + span data). This will take the service 2 sampling rate > 0.01 rite?

Please can you clarify on the above scenario.

Here is some text from a discussion I had earlier on the subject.

In addition to that, the transferred trace context also contains the sampling decision of the calling transaction, so that the root transaction of the trace makes the sampling decision for all “downstream” transactions of that trace.
This way, distributed tracing still works for 100% of the transactions and the sampling decisions are consistent within traces across different services.

Hopefully that makes sense.

I suggest setting up your services and take a look and then tune from there.

I have already setup cluster and running APM.

My question is more from understanding purpose.

By this statement, we are essentially saying that service 2 sampling rate can shoot above its own configured rate of 0.01 since it has to adhere to caller's sampling decision.

If there are multiple callers with high sampling rate, then, since the "called" microservice has to adhere to sampling decision which has already been taken by the caller, it can overshoot its own sampling rate.

Example Service 1 , 2 , 3 calling service 4.
Service 1 , 2 and 3 on 100% sampling rate and service 4 on 1% sampling rate.
Service 4 has to push its data for all the calls from 1,2,3 as every call by 1,2,3 is being sampled.
Hence if you look at service 4 sample rate it will be 100% instead of 1%.

By this statement, we are essentially saying that service 2 sampling rate can shoot above its own configured rate of 0.01 since it has to adhere to caller's sampling decision.

The sampling rate configured for an agent only affects traces started at that agent. If the agent is continuing a trace that started elsewhere, it will honour the propagated sampling decision.

Example Service 1 , 2 , 3 calling service 4.
Service 1 , 2 and 3 on 100% sampling rate and service 4 on 1% sampling rate.
Service 4 has to push its data for all the calls from 1,2,3 as every call by 1,2,3 is being sampled.
Hence if you look at service 4 sample rate it will be 100% instead of 1%.

Correct. The sampling rate is applied to traces as a whole, and not transactions specific to one service/agent. In this example, it sounds like no traces originate at Service 4; they all all originate at Service 1, and therefore are 100% sampled.

1 Like

Thank you. That clarifies my doubt.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.