Setting APM trace lifecycle policy per type (transaction v span) using custom policy

cozog · May 7, 2024, 8:42pm

When using the legacy APM server approach for managing the lifecycle of APM traces one is able to set a separate policy for both spans and transactions via event type mapping;

However, for the new recommended approach using the ILM feature of ELK i see no way of doing this.

Please advise!

axw · May 13, 2024, 1:25am

@cozog apologies for the delay in responding.

As described at Index lifecycle management | APM User Guide [7.17] | Elastic, you can create different ILM policies per data stream. Each data stream organises data by data type.

Since 8.0, we consider "transactions" and "spans" to be the same kind of data type -- that's why they end up in the same traces-apm-<namespace> data stream.

Another thing to bear in mind is that from 8.0 on, APM Server will discard "unsampled" transactions. We used to store a document for every transaction, even if it was (nominally) not sampled by the agent. These "unsampled" transactions would not have any associated span documents. Since 8.0 we only ever store sampled transactions and spans. So managing the lifecycle of transactions and spans independently makes less sense these days.

You could send transactions and spans to separate data streams with an ingest pipeline, and then manage the lifecycle of the data streams independently. It's not something that will be supported out of the box.

Another thing to bear in mind is that we are converging towards the OpenTelemetry data model, which does not make this distinction between "transactions" and "spans" -- so splitting data in this way may not be possible in the long term.

Hope this provides some useful context. If after reading this you would still like to manage lifecycle of transactions and spans independently, I'd be keen to hear some details on why that is.

cozog · May 13, 2024, 2:34pm

@axw Thanks much, no worries w/r/t delay!

I understand, you've confirmed my suspicions.

The use case is this:
We want to extend the lifecycle of transactions but not spans from the default of 10 days because transactions give us a sort of Big Picture view of our system usage that we'd like to keep for a bit longer, whereas spans are too granular to be needed for said Big Picture view.

Is this really such an edge case?

Yep I can look into the custom ingest pipelines as an alternative approach; it intimidates me slightly to not use the default data stream indexes for trace events, but it's nice to have that option.

axw · May 14, 2024, 12:39am

We want to extend the lifecycle of transactions but not spans from the default of 10 days because transactions give us a sort of Big Picture view of our system usage that we'd like to keep for a bit longer, whereas spans are too granular to be needed for said Big Picture view.

What details of the transactions are you interested in? In case you're not already aware, APM Server will pre-aggregate metrics from transactions, and these metrics are stored in a separate data stream with a much longer retention period by default.

That doesn't help if you want all of the details of the transactions -- URLs, User-Agent, etc. But if you just want a latency distribution for the transaction name/type per service, then they may be enough.

Is this really such an edge case?

I don't recall hearing other requests for this in recent years, but maybe I'm just not talking to the right people.

My thinking is that if you're interested in a specific transaction, then there's a good chance that you will want the full trace to make sense of it; and if you just want aggregate statistics, then you can use the metric documents.

cozog · May 14, 2024, 2:54am

In case you're not already aware, APM Server will pre-aggregate metrics from transactions, and these metrics are stored in a separate data stream with a much longer retention period by default.

@axw You are correct, looking into the same metrics that build the APM overview dashboard is what we should probably be doing. And yep as you probably guess I am indeed specifically interested in latency and total requests, which the pre-aggregations should provide.

OK thank you for guiding me in the right direction! You're a prince.

Topic		Replies	Views
APM Server Fleet managed vs Legacy: incompatible? APM elastic-stack-monitoring , ilm-index-lifecycle-management , ui	2	268	January 15, 2024
Rollup of apm indices APM	5	681	July 1, 2020
Custom ILM policies for APM DataStreams Elastic Observability	14	517	March 25, 2024
Dedicated ILM for each telemetry data stream Elastic Observability ilm-index-lifecycle-management , datastreams	0	203	December 20, 2023
Rollups of APM high level data APM ilm-index-lifecycle-management , server , ui	2	559	November 18, 2020

Setting APM trace lifecycle policy per type (transaction v span) using custom policy

Related topics