When using the legacy APM server approach for managing the lifecycle of APM traces one is able to set a separate policy for both spans and transactions via event type mapping;
However, for the new recommended approach using the ILM feature of ELK i see no way of doing this.
Since 8.0, we consider "transactions" and "spans" to be the same kind of data type -- that's why they end up in the same traces-apm-<namespace> data stream.
Another thing to bear in mind is that from 8.0 on, APM Server will discard "unsampled" transactions. We used to store a document for every transaction, even if it was (nominally) not sampled by the agent. These "unsampled" transactions would not have any associated span documents. Since 8.0 we only ever store sampled transactions and spans. So managing the lifecycle of transactions and spans independently makes less sense these days.
You could send transactions and spans to separate data streams with an ingest pipeline, and then manage the lifecycle of the data streams independently. It's not something that will be supported out of the box.
Another thing to bear in mind is that we are converging towards the OpenTelemetry data model, which does not make this distinction between "transactions" and "spans" -- so splitting data in this way may not be possible in the long term.
Hope this provides some useful context. If after reading this you would still like to manage lifecycle of transactions and spans independently, I'd be keen to hear some details on why that is.
The use case is this:
We want to extend the lifecycle of transactions but not spans from the default of 10 days because transactions give us a sort of Big Picture view of our system usage that we'd like to keep for a bit longer, whereas spans are too granular to be needed for said Big Picture view.
Is this really such an edge case?
Yep I can look into the custom ingest pipelines as an alternative approach; it intimidates me slightly to not use the default data stream indexes for trace events, but it's nice to have that option.
We want to extend the lifecycle of transactions but not spans from the default of 10 days because transactions give us a sort of Big Picture view of our system usage that we'd like to keep for a bit longer, whereas spans are too granular to be needed for said Big Picture view.
What details of the transactions are you interested in? In case you're not already aware, APM Server will pre-aggregate metrics from transactions, and these metrics are stored in a separate data stream with a much longer retention period by default.
That doesn't help if you want all of the details of the transactions -- URLs, User-Agent, etc. But if you just want a latency distribution for the transaction name/type per service, then they may be enough.
Is this really such an edge case?
I don't recall hearing other requests for this in recent years, but maybe I'm just not talking to the right people.
My thinking is that if you're interested in a specific transaction, then there's a good chance that you will want the full trace to make sense of it; and if you just want aggregate statistics, then you can use the metric documents.
In case you're not already aware, APM Server will pre-aggregate metrics from transactions, and these metrics are stored in a separate data stream with a much longer retention period by default.
@axw You are correct, looking into the same metrics that build the APM overview dashboard is what we should probably be doing. And yep as you probably guess I am indeed specifically interested in latency and total requests, which the pre-aggregations should provide.
OK thank you for guiding me in the right direction! You're a prince.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.