We haev java agent 1.42 and Elasticsearch/APM servers are in 8.10 version.
I haev tried with sampling rate of .2 and .5. Both values and 1 are getting response time/throughput for all samples. But document has change in behavior post 8 release as below
Could you check and update how sampling rate works with 8.10 version?
As per documentation
By default, the agent will sample every transaction (e.g. request to your service). To reduce overhead and storage requirements, you can set the sample rate to a value between 0.0 and 1.0. (For pre-8.0 servers the agent still records and sends overall time and the result for unsampled transactions, but no context information, labels, or spans. When connecting to 8.0+ servers, the unsampled requests are not sent at all).
What the agent does is they send the effective sampling rate along with sampled trace events, and APM Server extrapolates metrics from that. So for example, say the sampling rate is 0.5. Then for every transaction that APM Server observes, it will count it as 2 (inverse sample rate = 1/0.5).
So say you do
0.1 (BTW include the leading
0 please) and you have 100 Transactions
10 Will be Sampled with the Complete Context etc.
For the Other 90 Elastic APM calculate as described above for transaction rate and latency as metrics based on the sample transactions so they can be shown in visualizations, and used alerts, ML jobs etc.
This model works well at scale, but if you have very low transaction rates, you want to make sure that you have a fairly high sample rate.
Hope that helps.
Yes. makes sense. Confused with statement in 8+ version. Could you define what happens to errors and stack traces collection with sampling set? does APM agent collect all errors and its stack-traces? if so, is there anyway to reduce number of errors stack-traces with same type to reduce storage?
@senyam08 all errors/exceptions are kept regardless of the sampling rate. In case you haven't found it yet, there's more documentation on sampling at Transaction sampling | APM User Guide [8.10] | Elastic
if so, is there anyway to reduce number of errors stack-traces with same type to reduce storage?
Not out of the box. You could write an ingest pipeline that samples error documents (i.e. drops them with some probability), but the tricky part would be working out the right probability. That's not likely to catch rare exceptions though. This might be a job for a
latest Transform, grouping by the
error.grouping_key field. That field is based on a hash of the stacktrace, so you can use it to deduplicate exceptions.