Ingest pipeline with conditions runs tests correct but no documents are processed

Hi;

Elasticsearch version: 8.6.2
Logstash version: 8.6.2

I have created and tested at simple ingest pipeline, which adds an event.ingested field to Filebeat documents, which do not already have this field set by other Filebeat pipelines.

All events are received by Logstash before ingestion into Elasticsearch ingestion nodes.

The pipeline is configured like this:

{
  "set_event_ingested": {
    "description": "Add a event.ingested field to filebeat events with the value _ingest.timestamp if the field is not set in the document",
    "processors": [
      {
        "set": {
          "field": "event.ingested",
          "value": "{{_ingest.timestamp}}",
          "if": "ctx?.agent?.type == 'filebeat' && !ctx.containsKey('event.ingested')"
        }
      }
    ],
    "version": 1,
    "on_failure": [
      {
        "append": {
          "field": "error.message",
          "value": [
            "{{ _ingest.on_failure_message }}"
          ]
        }
      }
    ]
  }
}

I have tested the pipeline from Kibana with documents which fulfill the conditions and other documents which do not fulfill the conditions. The pipeline works as expected.

However looking in nodes/stats it is seen the pipeline never executes:

         "set_event_ingested": {
            "count": 0,
            "time_in_millis": 0,
            "current": 0,
            "failed": 0,
            "processors": [
              {
                "set": {
                  "type": "conditional",
                  "stats": {
                    "count": 0,
                    "time_in_millis": 0,
                    "current": 0,
                    "failed": 0
                  }
                }
              }
            ]
          },

I have search other blog posts for similar problems, but cannot find a solution.

I have loaded many filebeat pipelines into Elasticsearch, so with the default pipelines around 200 ingest pipelines are loaded. However, several of these pipelines do not process any data events, since the beats have been upgraded from version 7.15 to 7.17.

Could this impact the ingest pipeline I have created? How can I force Elasticsearch to apply the set_event_ingested to all filebeat documents, which fulfill the conditions?

Best regards
Flemming

It seems like your ingest pipeline is correctly defined but it's not being used. In Elasticsearch, an ingest pipeline is not automatically applied to all incoming documents. You need to specify the pipeline during the index or bulk request.

Since you're using Logstash to ingest data into Elasticsearch, you need to specify the pipeline in your Logstash Elasticsearch output configuration. Here's an example:

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "my_index"
    pipeline => "set_event_ingested"
  }
}

This configuration tells Logstash to use the "set_event_ingested" pipeline when indexing documents into Elasticsearch, remember to restart Logstash after making changes to its configuration.

Hi @Opster_support ;

Thanks, I will try that.

However, in the Logstash output I do not specify the various Filebeat ingest pipelines f.x.

         "filebeat-7.15.0-apache-access-pipeline": {
            "count": 20138,
            "time_in_millis": 1583,

Why does this work without the pipeline specification? And will these pipelines still be applied to the events from the modules enabled in some filebeats?

Best regards
Flemming

The reason why the "filebeat-7.15.0-apache-access-pipeline" is being applied without being specified in the Logstash output is likely because it's being set in Filebeat itself.

Filebeat has a feature where it can set the ingest pipeline to be used for each event. This is typically used when you enable modules in Filebeat. Each module can specify a default ingest pipeline to process its data.

When Filebeat sends data to Logstash, it includes the pipeline name in the metadata of each event. Logstash then forwards this metadata along with the event to Elasticsearch, which uses the specified pipeline to process the event.

If you specify a pipeline in the Logstash output, it will override the pipeline set by Filebeat. So, if you want to apply the "set_event_ingested" pipeline to all events, but also want to keep the pipelines set by Filebeat modules, you might need to create a new pipeline that first calls your "set_event_ingested" pipeline and then the module pipeline.

Here's an example of how you can do this:

PUT _ingest/pipeline/set_event_ingested_and_module_pipeline
{
  "description" : "first apply set_event_ingested, then the module pipeline",
  "processors" : [
    {
      "pipeline" : {
        "name" : "set_event_ingested"
      }
    },
    {
      "pipeline" : {
        "name" : "filebeat-7.15.0-apache-access-pipeline"
      }
    }
  ]
}

Then, in your Logstash output, you would specify this new pipeline:

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "my_index"
    pipeline => "set_event_ingested_and_module_pipeline"
  }
}

This way, both your "set_event_ingested" pipeline and the module pipeline will be applied to the events.

Please note i used opsgpt.io to build the examples here

First, thanks for pointing me to opsgpt.io. I'll take a closer look at that IA.

Yes, it correct filebeat events from modules set the ingest pipeline the module requires. I have noticed this when we upgrade the Filebeats and I have to load the corresponding ingest pipelines in advance.

Our use case for Filebeat and other beats modules, which uses ingest pipelines is a bit complex. When the beats are upgraded they are distributed with apt and chocolatey packages to many different hosts in the infrastructure. I cannot control exactly when a beat is upgraded, so for a period at last 2 versions of each pipeline have to be supported. Some times more than 2 versions. I should be able to resolve this by adding all required pipelines to you example set_event_ingested_and_module_pipelines. This solution requires some maintenance, but should be a possible.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.