How does sampling work

manohar_deepu · September 5, 2020, 6:46am

Hi
Can you please clarify these questions around sampling,

Does agents push all the metrics to apm server irrespective of sampling and apm server chooses the data to ingest based on sampling %
Or does apm agents themselves discard some data based on sampling rate at machine level?
If the working mode is point 1, whats the use of having sampling rate configuration at agent level?

stephenb · September 7, 2020, 2:01am

#2) The Java APM agent performs the sampling before the data is sent to the APM server.

To be clear even when transaction sampling is set to a value less than 1.0 every transaction time and duration is recorded. When a transaction is not sampled it means it does not include details for the spans / detailed trace information.

Perhaps also look at dynamic profiling since you appear to be using the Java agent.

I would suggest taking a good look at all the sampling, reporting minimum span durations etc settings, as you can highly customize the Java agent behavior.

https://www.elastic.co/guide/en/apm/agent/java/current/configuration.html

manohar_deepu · September 7, 2020, 5:34am

Thanks for the reply, but my question is more around if the Agent itself discards sampling then how does multiple microservices involved in distributed tracing make sure that when the originator transaction is sampled, they also send their data irrespective of their own sampling rate.

Example if service 1 service 2 are on APM and service 1 calls service2 .
service1 sampling rate is 0.1 and service 2 sampling rate is 0.01,

Then irrespective of service 2 sampling rate, if the service 2 sees that the call from service 1 is being sampled, then it has to push its data also (transaction + span data). This will take the service 2 sampling rate > 0.01 rite?

Please can you clarify on the above scenario.

stephenb · September 7, 2020, 5:43am

Here is some text from a discussion I had earlier on the subject.

In addition to that, the transferred trace context also contains the sampling decision of the calling transaction, so that the root transaction of the trace makes the sampling decision for all “downstream” transactions of that trace.
This way, distributed tracing still works for 100% of the transactions and the sampling decisions are consistent within traces across different services.

Hopefully that makes sense.

I suggest setting up your services and take a look and then tune from there.

manohar_deepu · September 7, 2020, 6:09am

I have already setup cluster and running APM.

My question is more from understanding purpose.

By this statement, we are essentially saying that service 2 sampling rate can shoot above its own configured rate of 0.01 since it has to adhere to caller's sampling decision.

If there are multiple callers with high sampling rate, then, since the "called" microservice has to adhere to sampling decision which has already been taken by the caller, it can overshoot its own sampling rate.

Example Service 1 , 2 , 3 calling service 4.
Service 1 , 2 and 3 on 100% sampling rate and service 4 on 1% sampling rate.
Service 4 has to push its data for all the calls from 1,2,3 as every call by 1,2,3 is being sampled.
Hence if you look at service 4 sample rate it will be 100% instead of 1%.

axw · September 7, 2020, 6:22am

By this statement, we are essentially saying that service 2 sampling rate can shoot above its own configured rate of 0.01 since it has to adhere to caller's sampling decision.

The sampling rate configured for an agent only affects traces started at that agent. If the agent is continuing a trace that started elsewhere, it will honour the propagated sampling decision.

Example Service 1 , 2 , 3 calling service 4.
Service 1 , 2 and 3 on 100% sampling rate and service 4 on 1% sampling rate.
Service 4 has to push its data for all the calls from 1,2,3 as every call by 1,2,3 is being sampled.
Hence if you look at service 4 sample rate it will be 100% instead of 1%.

Correct. The sampling rate is applied to traces as a whole, and not transactions specific to one service/agent. In this example, it sounds like no traces originate at Service 4; they all all originate at Service 1, and therefore are 100% sampled.

manohar_deepu · September 7, 2020, 7:15am

Thank you. That clarifies my doubt.

system · September 28, 2020, 3:15am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Java APM agent sample rate ignored APM java	7	688	August 17, 2020
APM Java agent: How to customize transaction sample rates based on different controllers Elastic Observability	3	21	September 12, 2024
Transaction_sample_rate post 8 release versions Elasticsearch	5	214	October 17, 2023
Capture all traces & spans and custom configuration doubt APM java	8	1120	November 13, 2019
Tuning APM setup settings APM java	5	438	July 4, 2019

How does sampling work

Related topics