Custom ILM policies for APM DataStreams

Hi,

By using the ingest pipeline I am rerouting the data to different APM data streams. ref [OpenTelemetry] data_stream.namepace and data_stream.dataset aren't being respected · Issue #10191 · elastic/apm-server · GitHub

[
  {
    "reroute": {
      "namespace": "{{service.environment}}"
    }
  }
]

I would like to set ILM policies like below for different environments.

traces-apm-dev --> 7d
traces-apm-qa --> 15d
traces-apm-prod --> 30d.

The custom ILM policy mentioned in this document (Index lifecycle management | Elastic Observability [8.12] | Elastic) stats that " Fleet creates a default *@custom component template for each data stream". But in fact it creates them per Index Template. So is there any other way to achieve custom ILM policies based on different data streams ?

For now I think you really just have one option: copy the traces-apm@package index template for each environment, and update the index pattern, priority, and ILM settings for each one. The priority must be higher than the traces-apm@package template.

Hi Andrew,

I believe you have meant the component template?

I did try to clone the existing index template and update the inedx patterns and life cycle policies. but when I do that the data stream for traces are not even getting created.

screenshots of Index template.

I believe you have meant the component template?

Oops, sorry for the confusion, I was referring to the component template name, but I actually meant the index templates.

I did try to clone the existing index template and update the inedx patterns and life cycle policies. but when I do that the data stream for traces are not even getting created.

Do you see some errors in the APM Server or Elasticsearch logs?

Another thing you could try is indexing a document directly into the data stream from the dev console, which may surface the error more easily. Try this:

POST /traces-apm-dev/_doc
{
  "@timestamp": "2024-03-18",
  "observer": {
    "type": "apm-server",
    "version": "8.14.0"
  },
  "processor": {
    "event": "transaction"
  },
  "trace": {
    "id": "0123456789abcdef0123456789abcdef"
  },
  "data_stream": {
    "type": "traces",
    "dataset": "apm",
    "namespace": "dev"
  },
  "transaction": {
    "result": "Success",
    "duration": {"us": 32592},
    "representative_count": 1,
    "id": "945254c567a5417e"
  }
}

No errors logged in apm server or in Elasticsearch.

But some progress when I try to create a index from the content you have shared.

{
    "error": {
        "root_cause": [
            {
                "type": "index_not_found_exception",
                "reason": "no such index [composable template [traces-apm-dev] forbids index auto creation]",
                "index_uuid": "_na_",
                "index": "composable template [traces-apm-dev] forbids index auto creation"
            }
        ],
        "type": "index_not_found_exception",
        "reason": "no such index [composable template [traces-apm-dev] forbids index auto creation]",
        "index_uuid": "_na_",
        "index": "composable template [traces-apm-dev] forbids index auto creation"
    },
    "status": 404
}

got it working after the updating this settings

1 Like

I have the similar requirement for logs and metrics also. But this is too much of over head maintenance for the APM functionality users. I believe most of the enterprises will have a similar requirement to manage data per environment differently (I see another post here Indexing Application Trace Data into Separate Indices Based on Application Names - Elastic Observability / APM - Discuss the Elastic Stack). So is there any other way to achieve this (may be future release) ?

I am thinking about allowing variable configuration in component templates like below. And parse and apply the data from the first data stream creation.

Its is good to have the apm templates as managed instead of cloning and maintaining.

image

I agree, it's not a very ergonomic solution right now.

There are a couple of things that should help in the future. First, we are working towards supporting Data Stream lifecycle (DLM) out of the box in a future release -- maybe 8.14, unclear at this stage when DLM will be GA. With that you can define the retention duration directly in a component template.

We already have support for customisable component templates (e.g. traces-apm@custom), but the template names do not currently support variables like in your example. For that, we would need something along the lines of what is proposed in Allow customizing managed data streams at different levels of granularity · Issue #97664 · elastic/elasticsearch · GitHub.

So the idea is that you would be able to create a component template like traces-apm-dev@custom, and that would automatically apply to any new traces-apm-dev data stream, without having to modify any index templates.

Ah. That is cool.

I understand this for traces. It would be much helpful. But how this will gonna support logs and metrics which has a format like logs-apm.app.{{service.name}}-dev. In our scenario, we have around 150+ microservices. We can't create and maintain component templates for each services.

Edited: Pls ignore this post :slight_smile: . I should have read the Github links first. It has all the answers already :slight_smile:

1 Like

So to assign custom ILM policies for APM data streams, below steps need to be followed (from Kibana UI).

  • Clone the managed templates (logs-apm.app or traces-apm).

  • Update the Index patterns to match desired data stream name

  • Update the priority above 200 (200 is set for fleet managed templates, so it has to be higher than that)

  • Enable "Allow auto create" (Its disabled by default).

  • Update the lifecycle in index settings and save.

1 Like

I know you said to ignore, but I can't help myself :wink:

But how this will gonna support logs and metrics which has a format like logs-apm.app.{{service.name}}-dev

Another thing we've been working towards is moving away from having service-specific data streams by default. The idea is that we would send all logs to something like logs-generic-default by default, while enabling configurable routing using the reroute processor.

There are reasons why we haven't done this already, not least of which is because it would be a breaking change. Also, it would almost certainly lead to data loss without more improvements to Elasticsearch, which we've been working on, to reduce the risk of mapping conflicts and document rejections when sending logs with possibly different fields to the same data stream.

Configurable metrics-apm datastream pattern. · Issue #8182 · elastic/apm-server · GitHub is similar, but focuses on logs. There is some overlap in the solution for logs.

I'd be keen to hear if this sounds like an improvement to you, or if you anticipate any issues with that.

Well, they say curiosity is the mother of invention :smiley:

Ah, the script looks like a good workaround. I will try this for both logs and metrics and comeback.

Hi @axw ,

Sorry for getting back late. Got into some other priorities. Here is some observations.

  • Script pipeline didn't work for me. Couldn't able to figure out exactly why. I will try to replicate the steps and share the details.

  • Managed to get what I wanted from the reroute processor itself.

  1. App Logs and Metrics to single Data Stream based on {{service.environment}}. 2 pipeline logs-apm.app@custom and metrics-apm.app@custom with below reroute processor
 "processors": [
      {
        "reroute" : {
          "description" : "Logs pipeline to replace dataset and namespace",
           "dataset": "apm.app.all",
           "namespace": "{{service.environment}}"
        }
      }
    ]
  1. Traces and App Errors based on {{service.environment}}. 2 pipelines logs-apm.error@custom and traces-apm@custom
"processors" : [
    {
      "reroute" : {
        "description" : "Pipeline to replace the namespace to service.environment",
        "namespace": "{{service.environment}}"
      }
    }
  ]

image.

Also tried to make use of the global pipeline for namespace and dataset for apm.app data streams. but it looks like the reroute processor execute only once from the global pipeline. So ended up still having multiple data streams for app logs and metrics.

PUT _ingest/pipeline/global@custom
{
    "description": "Global pipeline",
    "processors": [
      {
        "reroute" : {
          "description" : "Global pipeline to replace the namespace to service.environment",
          "namespace": "{{service.environment}}"
        }
      }
    ]
}

PUT _ingest/pipeline/logs-apm.app@custom
{
    "description": "Logs pipeline",
    "processors": [
      {
        "reroute" : {
          "description" : "Logs pipeline to replace dataset",
           "dataset": "apm.app.all"
        }
      }
    ]
}

Also tried to use logs@custom and logs-apm.integration@custom. But when they replace dataset from apm.error to apm.app.all for Errors, the APM UI is getting broken for Errors and Failed transaction rate. I believe this is expected because these charts are depending on the datastream logs-apm.error-*

Thanks for your feedback! Yes, reroute essentially terminates the pipeline. Reroute processor | Elasticsearch Guide [8.12] | Elastic says:

After a reroute processor has been executed, all the other processors of the current pipeline are skipped, including the final pipeline. If the current pipeline is executed in the context of a Pipeline, the calling pipeline will be skipped, too. This means that at most one reroute processor is ever executed within a pipeline, allowing to define mutually exclusive routing conditions, similar to a if, else-if, else-if, … condition.

I think for errors, in the future we may stop writing them to a special data stream (or stop assuming they're written to a special data stream) and instead just search logs with certain fields. In my opinion you should be able to organise them in different ways through rerouting rules without affecting the UI.

1 Like