Elastic fleet change datastream name based on field value

Gustavo_Llermaly · March 31, 2022, 2:40am

Hi, I need to set different ILM policies based on logs field value (one for PROD , other for NONPROD), so I created an ingest pipeline that renames the _index field as @ruflin suggested in a github issue and works good when I reindex the datastream manually under the index pipeline with the new processors.

The problem is the new incoming logs are not being ingested at all, nor with the original index name set in fleet, or with the renamed index names.

How can I achieve this with Fleet?.

Thanks

ruflin · March 31, 2022, 7:53am

Do you get any error during ingest time? What are you using to ingest logs? Elastic Agent? Could you share the config and your ingest pipeline?

Gustavo_Llermaly · March 31, 2022, 12:05pm

Thanks @ruflin for your quick answer.

I'm using Custom Logs integration to parse some JSON files, and adding the pipeline name in the advanced setting sections.

This is my config file for the custom logs integration:

json.keys_under_root: true
json.add_error_key: true
json.overwrite_keys: true
tags: ["esb"]
pipeline: enterpriselogs-pipeline

This is my enterpriselogs-pipeline

{
  "description": "Common Services and Webmethods logs",
  "processors": [
    {
      "drop": {
        "if": "ctx.event_context?.app_runtime?.app_environment == null",
        "ignore_failure": true
      }
    },
    {
      "split": {
        "field": "business_context_id",
        "separator": ",",
        "ignore_missing": true,
        "ignore_failure": true
      }
    },
    {
      "date": {
        "field": "eventdate",
        "formats": [
          "MM-dd-yyyy HH:mm:ss.SSS"
        ],
        "ignore_failure": true
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "logs-enterpriselogs-{{{data_stream.namespace}}}-prod",
        "ignore_empty_value": true,
        "if": "ctx.event_context?.app_runtime?.app_environment == 'PROD' ",
        "ignore_failure": true,
        "media_type": "text/plain"
      }
    },
    {
      "set": {
        "field": "_index",
        "value": "logs-enterpriselogs-{{{data_stream.namespace}}}-nonprod",
        "ignore_empty_value": true,
        "if": "ctx.event_context?.app_runtime?.app_environment != 'PROD' ",
        "ignore_failure": true,
        "media_type": "text/plain"
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "error_information",
        "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message {{ _ingest.on_failure_message }}"
      }
    }
  ]
}

Thanks again

Gustavo_Llermaly · March 31, 2022, 2:23pm

Also, how can I recover the logs I lost during the process? In regular metricbeat I would remove the registry folder but I'm not sure how to achieve this on Fleet or if there is a way to do it differently.

Appreciate your help

Gustavo_Llermaly · April 2, 2022, 5:55pm

@ruflin also tried using script processor with no luck. Any idea? Thanks

Gustavo_Llermaly · April 3, 2022, 4:07am

I. ended up doing the following:

created a new integration per env, now we have one for prod, another for nonprod. This change separates the indices and send the documents to different ingest pipelines
created 2 ingest pipelines, one that drops PROD docs, and the other drops NONPROD documents
adjusted index templates to now grab this new data streams

Is there a better way to achieve it?

ruflin · April 4, 2022, 6:45am

Hi @Gustavo_Llermaly For Filebeat, the registry still exists but it is in the data directory of Elastic Agent. Stop Elastic Agent, find the registry file and remove it. If you start again, all data should be shipped again.

For the pipeline you shared, I'm surprised the docs did not show up at all because you have an on_failure handling. So even if there was a failure, I would expect the data to be ingested.

An integration per env makes sense. I assume you also set different namespaces to separate the data. The part I don't understand, why do both logs end up in the same pipeline so you have to drop one of them in the pipeline? I assume these are different hosts or log files or are the events mixed together?

Gustavo_Llermaly · April 6, 2022, 11:18am

Thanks @ruflin for your answer, I tried to find the registry file in my mac with no success (filebeat folder with logs was there) I will try to do the same in the current machine.

I'm also impressed the docs were not going in, tried locally and the result is the same. I was expecting to do nothing if the index can not be renamed, but documents just dissapeared. Would be great if you can test locally and validate this.

Yes. I set different namespaces, same dataset per each integrations. And yes, logs are mixed together and I have seen this usecase twice in the last month and my concern is to be processing the same file twice. Index renaming sounded more performant but I was not able to make it work.

Gustavo_Llermaly · April 9, 2022, 3:01am

I will just close this one, thanks for your answer @ruflin. We removed the registry and ingesting data again. Not happy processing everything twice but it is what it is.

Index splitting + ignoring logs older than X time based on a field should be a must. We see cases of "all history ingested in one day" all the time. And makes sense as people install Fleet and run it against a folder full of logs expecting ILM to solve everything.

ruflin · April 11, 2022, 10:49am

Hi @Gustavo_Llermaly Not fully happy with the solution you had to use We are currently working on quite a few efforts around data routing / index splitting and hopefully can provide you soon with a better experience around the custom log integrations.

For ignoring old data: In the context of Fleet and integrations we must think of ways to make this configurable, agree.

system · May 9, 2022, 12:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fleet apply pipelines Elastic Observability fleet	7	1245	November 4, 2022
Custom Logs Ingest Pipeline Elasticsearch fleet	9	4040	February 21, 2022
Default ingest pipeline overwritten Elasticsearch	6	334	January 2, 2024
Elasticsearch output for Elastic Agent - adding an ingest pipeline Beats elastic-agent , ingest-pipeline	5	3043	December 23, 2020
Logs-azure.eventhub@custom Elastic Agent fleet	1	197	May 21, 2024

Elastic fleet change datastream name based on field value

Related topics