Choppy graphs with APM 8.x

Hi @Alex-T ,

I assume this is the Throughput graph in APM UI. My understanding is that because of a low sample rate at 0.1%, and that apm-server 8.x no longer store unsampled transactions, the throughput graph becomes choppy.

In your case, assuming an average 300tpm, at 0.1% sample rate, apm-server receives 0.3 trace per minute. Then, for each transaction apm-server receives, apm-server extrapolates the “representative count” by multiplying this trace event by the inverse of sample rate, e.g. each transaction event represents 1000 transactions.

In short, what you’re seeing in your graph is an unfortunate combination of low sample rate, low throughput, and the cost savings improvement in 8.x.

As for workarounds,

  • apm-server.sampling.keep_unsampled was removed in 8.0 and there is no way to enable it.
  • You may enable 100% sampling on the client and use * Tail-based sampling , so that apm-server sees every transaction event and can produce a less choppy graph, while storing the same sampled number of events in Elasticsearch. This comes with an added benefit as apm-server tail-based sampling has a bias to sample slow / failure traces. However, it requires a fast disk and more CPU power. On a side note, 9.x comes with major performance improvements in TBS.
1 Like