Kibana version: 7.6.1
Elasticsearch version: 7.6.1
APM Server version: 7.6.1
APM Agent language and version: Java Agent 1.14.0
I'm experimenting with APM with the Java Agent and am exploring ways to reduce processing, network utilization and storage size on the server side without missing useful details. One of the applications I'm looking to monitor is very chatty in terms of # of SQL statements executed and also has polling based tasks that execute repeatedly every X seconds executing a large amount of SQL statements. The individual SQL statements are fast (in microseconds, < 1 millisecond) and the overall transaction times even with many of them are fine (e.g. under a second) but results in a lot of SQL spans getting generated until it hits the agent's default cap of 500 spans under the transaction.
I can reduce the sample_rate but my concern here is not getting the nested spans on actual interesting transactions. "Interesting" in this case would be:
- Transactions that take over X time
- Transactions that end in an error
I'm essentially looking to perform some post transaction decision on whether to send the captured spans or not. Looking into this, the sample decision itself has to be made upfront with minimal context since it determines whether the agent will do the work or not of capturing the spans. I'm actually not too worried about the agent overhead in this case since for the most part it's just capturing the SQL statements and timing them which relative to actual SQL statements being called shouldn't be too bad. The reporting of spans though happens immediately when the span is completed (well places it in the reporting queue at least) so it doesn't have any context at that point on the result of the transaction. What I would like though is some way to defer the reporting decision until the transaction completes or maybe just after X seconds to account for long running transaction where it's not desired to keep this data in memory for longer than needed.
So I guess I'm looking if it's possible to do some configuration like:
- sample_rate = 1 (or something fairly high)
- only_report_spans_if_above_ms = Xms (this configuration would report spans if the encompassing transaction is >= Xms OR if the transaction encountered an error)
Having something like the above would help me reduce storage size and processing pressure on Elasticsearch of the apm-spans index. I tried to look if there's some option to just discard these kind of transactions within the APM server or Elasticsearch but I'd also like to avoid the network utilization too by discarding these at the agent level. For Elasticseach I was trying to see if I could just drop "un-interesting" spans but it seemed like it would require some application processing to first fetch transactions that were slow or errored (via apm-transaction and apm-error) and then from there delete all the associated spans.