Elastic fleet change datastream name based on field value

Hi, I need to set different ILM policies based on logs field value (one for PROD , other for NONPROD), so I created an ingest pipeline that renames the _index field as @ruflin suggested in a github issue and works good when I reindex the datastream manually under the index pipeline with the new processors.

The problem is the new incoming logs are not being ingested at all, nor with the original index name set in fleet, or with the renamed index names.

How can I achieve this with Fleet?.

Thanks

Do you get any error during ingest time? What are you using to ingest logs? Elastic Agent? Could you share the config and your ingest pipeline?

Thanks @ruflin for your quick answer.

I'm using Custom Logs integration to parse some JSON files, and adding the pipeline name in the advanced setting sections.

This is my config file for the custom logs integration:

json.keys_under_root: true
json.add_error_key: true
json.overwrite_keys: true
tags: ["esb"]
pipeline: enterpriselogs-pipeline

This is my enterpriselogs-pipeline

{
  "description": "Common Services and Webmethods logs",
  "processors": [
    {
      "drop": {
        "if": "ctx.event_context?.app_runtime?.app_environment == null",
        "ignore_failure": true
      }
    },
    {
      "split": {
        "field": "business_context_id",
        "separator": ",",
        "ignore_missing": true,
        "ignore_failure": true
      }
    },
    {
      "date": {
        "field": "eventdate",
        "formats": [
          "MM-dd-yyyy HH:mm:ss.SSS"
        ],
        "ignore_failure": true
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "logs-enterpriselogs-{{{data_stream.namespace}}}-prod",
        "ignore_empty_value": true,
        "if": "ctx.event_context?.app_runtime?.app_environment == 'PROD' ",
        "ignore_failure": true,
        "media_type": "text/plain"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "logs-enterpriselogs-{{{data_stream.namespace}}}-nonprod",
        "ignore_empty_value": true,
        "if": "ctx.event_context?.app_runtime?.app_environment != 'PROD' ",
        "ignore_failure": true,
        "media_type": "text/plain"
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "error_information",
        "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message {{ _ingest.on_failure_message }}"
      }
    }
  ]
}

Thanks again

Also, how can I recover the logs I lost during the process? In regular metricbeat I would remove the registry folder but I'm not sure how to achieve this on Fleet or if there is a way to do it differently.

Appreciate your help

@ruflin also tried using script processor with no luck. Any idea? :frowning: Thanks

I. ended up doing the following:

  • created a new integration per env, now we have one for prod, another for nonprod. This change separates the indices and send the documents to different ingest pipelines
  • created 2 ingest pipelines, one that drops PROD docs, and the other drops NONPROD documents
  • adjusted index templates to now grab this new data streams

Is there a better way to achieve it?

Hi @Gustavo_Llermaly For Filebeat, the registry still exists but it is in the data directory of Elastic Agent. Stop Elastic Agent, find the registry file and remove it. If you start again, all data should be shipped again.

For the pipeline you shared, I'm surprised the docs did not show up at all because you have an on_failure handling. So even if there was a failure, I would expect the data to be ingested.

An integration per env makes sense. I assume you also set different namespaces to separate the data. The part I don't understand, why do both logs end up in the same pipeline so you have to drop one of them in the pipeline? I assume these are different hosts or log files or are the events mixed together?

Thanks @ruflin for your answer, I tried to find the registry file in my mac with no success (filebeat folder with logs was there) I will try to do the same in the current machine.

I'm also impressed the docs were not going in, tried locally and the result is the same. I was expecting to do nothing if the index can not be renamed, but documents just dissapeared. Would be great if you can test locally and validate this.

Yes. I set different namespaces, same dataset per each integrations. And yes, logs are mixed together and I have seen this usecase twice in the last month and my concern is to be processing the same file twice. Index renaming sounded more performant but I was not able to make it work.

I will just close this one, thanks for your answer @ruflin. We removed the registry and ingesting data again. Not happy processing everything twice but it is what it is.

Index splitting + ignoring logs older than X time based on a field should be a must. We see cases of "all history ingested in one day" all the time. And makes sense as people install Fleet and run it against a folder full of logs expecting ILM to solve everything.

Hi @Gustavo_Llermaly Not fully happy with the solution you had to use :frowning: We are currently working on quite a few efforts around data routing / index splitting and hopefully can provide you soon with a better experience around the custom log integrations.

For ignoring old data: In the context of Fleet and integrations we must think of ways to make this configurable, agree.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.