The custom ILM policy mentioned in this document (Index lifecycle management | Elastic Observability [8.12] | Elastic) stats that " Fleet creates a default *@custom component template for each data stream". But in fact it creates them per Index Template. So is there any other way to achieve custom ILM policies based on different data streams ?
For now I think you really just have one option: copy the traces-apm@package index template for each environment, and update the index pattern, priority, and ILM settings for each one. The priority must be higher than the traces-apm@package template.
I did try to clone the existing index template and update the inedx patterns and life cycle policies. but when I do that the data stream for traces are not even getting created.
Oops, sorry for the confusion, I was referring to the component template name, but I actually meant the index templates.
I did try to clone the existing index template and update the inedx patterns and life cycle policies. but when I do that the data stream for traces are not even getting created.
Do you see some errors in the APM Server or Elasticsearch logs?
Another thing you could try is indexing a document directly into the data stream from the dev console, which may surface the error more easily. Try this:
No errors logged in apm server or in Elasticsearch.
But some progress when I try to create a index from the content you have shared.
{
"error": {
"root_cause": [
{
"type": "index_not_found_exception",
"reason": "no such index [composable template [traces-apm-dev] forbids index auto creation]",
"index_uuid": "_na_",
"index": "composable template [traces-apm-dev] forbids index auto creation"
}
],
"type": "index_not_found_exception",
"reason": "no such index [composable template [traces-apm-dev] forbids index auto creation]",
"index_uuid": "_na_",
"index": "composable template [traces-apm-dev] forbids index auto creation"
},
"status": 404
}
I am thinking about allowing variable configuration in component templates like below. And parse and apply the data from the first data stream creation.
Its is good to have the apm templates as managed instead of cloning and maintaining.
I agree, it's not a very ergonomic solution right now.
There are a couple of things that should help in the future. First, we are working towards supporting Data Stream lifecycle (DLM) out of the box in a future release -- maybe 8.14, unclear at this stage when DLM will be GA. With that you can define the retention duration directly in a component template.
So the idea is that you would be able to create a component template like traces-apm-dev@custom, and that would automatically apply to any new traces-apm-dev data stream, without having to modify any index templates.
I understand this for traces. It would be much helpful. But how this will gonna support logs and metrics which has a format like logs-apm.app.{{service.name}}-dev. In our scenario, we have around 150+ microservices. We can't create and maintain component templates for each services.
Edited: Pls ignore this post . I should have read the Github links first. It has all the answers already
I know you said to ignore, but I can't help myself
But how this will gonna support logs and metrics which has a format like logs-apm.app.{{service.name}}-dev
Another thing we've been working towards is moving away from having service-specific data streams by default. The idea is that we would send all logs to something like logs-generic-default by default, while enabling configurable routing using the reroute processor.
There are reasons why we haven't done this already, not least of which is because it would be a breaking change. Also, it would almost certainly lead to data loss without more improvements to Elasticsearch, which we've been working on, to reduce the risk of mapping conflicts and document rejections when sending logs with possibly different fields to the same data stream.
Sorry for getting back late. Got into some other priorities. Here is some observations.
Script pipeline didn't work for me. Couldn't able to figure out exactly why. I will try to replicate the steps and share the details.
Managed to get what I wanted from the reroute processor itself.
App Logs and Metrics to single Data Stream based on {{service.environment}}. 2 pipeline logs-apm.app@custom and metrics-apm.app@custom with below reroute processor
Traces and App Errors based on {{service.environment}}. 2 pipelines logs-apm.error@custom and traces-apm@custom
"processors" : [
{
"reroute" : {
"description" : "Pipeline to replace the namespace to service.environment",
"namespace": "{{service.environment}}"
}
}
]
.
Also tried to make use of the global pipeline for namespace and dataset for apm.app data streams. but it looks like the reroute processor execute only once from the global pipeline. So ended up still having multiple data streams for app logs and metrics.
PUT _ingest/pipeline/global@custom
{
"description": "Global pipeline",
"processors": [
{
"reroute" : {
"description" : "Global pipeline to replace the namespace to service.environment",
"namespace": "{{service.environment}}"
}
}
]
}
PUT _ingest/pipeline/logs-apm.app@custom
{
"description": "Logs pipeline",
"processors": [
{
"reroute" : {
"description" : "Logs pipeline to replace dataset",
"dataset": "apm.app.all"
}
}
]
}
Also tried to use logs@custom and logs-apm.integration@custom. But when they replace dataset from apm.error to apm.app.all for Errors, the APM UI is getting broken for Errors and Failed transaction rate. I believe this is expected because these charts are depending on the datastream logs-apm.error-*
After a reroute processor has been executed, all the other processors of the current pipeline are skipped, including the final pipeline. If the current pipeline is executed in the context of a Pipeline, the calling pipeline will be skipped, too. This means that at most one reroute processor is ever executed within a pipeline, allowing to define mutually exclusive routing conditions, similar to a if, else-if, else-if, … condition.
I think for errors, in the future we may stop writing them to a special data stream (or stop assuming they're written to a special data stream) and instead just search logs with certain fields. In my opinion you should be able to organise them in different ways through rerouting rules without affecting the UI.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.