Implement filebeat with ingest pipelies

Hello,
I have ELK instance which consists of elasticsearch, logstash and kibana.

I would like to implement filebeat with ingest pipelines - meaning filebeat sends logs with tags and ingest pipelines recognize it and create index.

Please correct me if I'm wrong and if such implementation will work.

I created filebeat with config:

    output:
      elasticsearch:
        enabled: true
        hosts: ["http://es1:9200"]
        timeout: 60
        #ssl.certificate_authorities: ["/etc/ssl/certs/logstash-cert-ca.tchaws.amadeus.net.pem"]
        #ssl.certificate: "/etc/ssl/certs/logstash-cert.tchaws.amadeus.net.pem"
        #ssl.key: "/etc/ssl/certs/logstash-key.tchaws.amadeus.net.key"      
    filebeat.inputs:
      - type: log
        tags: [dpkg]
        enabled: true
        paths:
          - /opt/dpkg.log

and filebeat has output defined to elasticsearch and sends logs with tags.

Inside elasticsearch I created such pipeline (with grok as preprocessor)

This is test grok pattern, so for all works I decided to use only greedydata.

But I don't know how can I create index from this pipeline.

In my understanding,
I have filebeat which sends logs with tags, ingest pipeline has pipeline with preprocessor with grok for logs.

Hi @dominbdg

Use the pipeline configuration

I implemented filebeat output as You said :

    output:
      elasticsearch:
        enabled: true
        hosts: ["http://es1:9200"]
        timeout: 60
        pipeline: "filebeat-1"
        #ssl.certificate_authorities: ["/etc/ssl/certs/logstash-cert-ca.tchaws.amadeus.net.pem"]
        #ssl.certificate: "/etc/ssl/certs/logstash-cert.tchaws.amadeus.net.pem"
        #ssl.key: "/etc/ssl/certs/logstash-key.tchaws.amadeus.net.key"
    filebeat.inputs:
      - type: log
        tags: [dpkg]
        enabled: true
        paths:
          - /opt/dpkg.log

but I still don't have this index created in elasticsearch.
I checked connection and filebeat is connected to elasticsearch.

I still don't understand how using ingest pipelines I can create indexes.
Maybe I'm not going to do at right way, because in my understanding,
Filebeat has pipeline name, it's point to elasticsearch, and on ingest pipeline I have created grok.

In my environment in logstash, I have input (listening to filebeat), grok parrern and output.
I cannot find anything to output (create index) in ingest pipelines

Ok lets back up a bit.

What version are you on?

How did you install filebeat?

Ingest pipelines do not create indices... They are used to transform data before writing them to an index .. ingest pipeline live in elasticsearch

The default datastream (collection ofindices) from filebeat will be something like filebeat-8.17..4

Ok well your filebeat is not configured to point to logstash it is pointing to Elasticsearch
So that is another issue.
The Ingest pipeline you showed above is not in Logstash that is in Elasticsearch

Ingest pipelines are in Elasticsearch
Logstash pipelines are in Logstash
The both transform data, but are 2 different architectural patterns.
Why are you using Logstash, do you need to? It adds complexity if you are just getting started.

So you need to decide if you want

A) Filebeat -> Elasticsearch (with ingest pipleine)

B) Filebeat -> Logstash (logstash pipeline) -> Elasticsearch

C) Filebeat -> Logstash (passthrough) -> Elasticsearch (with ingest pipleine)

If you are just getting started A is the simplest.

So what are you trying to do? and why?

Have you looked at the Filebeat quickstart?

There are commands to test the configurations as well.

Also once filebeat reads a file it will not read it again because it keeps track of what it ready so if you want to test and re-read the same file over again you will need to clean up the data registry

I'm happy that accidentally founded that I have in datastream data from filebeat side.
But I don't know how to name this data stream.
I have "filebeat-8.11.1" which is version of my filebeat.

I founded that automatically is it created template.
How can I name this data stream - probably from filebeat side but I couldn't find anything.
In logstash I had indexes created weekly or monthly. I don't know if it is possible to use such settings here.

But basically I'm happy to have data stream from this implementation.

In my config of filebeat I have pipeline "filebeat-1" and I expected that this will be the same name of data stream.

ok sorry for that late - but my basic explanation.
I'm not starting my adventure with ELK but I'm quite deeply skilled in ELK,
but my current environment are with elastic+logstash+kibana.

I have concept of to get rid ofn logstash, and I wanted to go with elastic with ingest pipelines and kibana.

Someone before me installed such environment and probably this guy didn't have experience with ingest pipelines.

This is my first shot with ingest pipelines and I'm working to adapt it to my environment.
For me pipeline in elastic is much more clear than in logstash.
More about that logstash is much more complicated speaking about grok patterns and requires in my case a lot of resources.

Right now I'm learning about ingest pipelines but I see that looks much more better than logstash.

In my case logstash is used mostly with flat log files (not json files) but when I putted very simple grok like greedydata - ingest pipelines are also work with that kind of files.

Okay in general though...

Ingest pipelines do not set the name of the index or data.... They are used to transform data.

I would read about data streams
How frequently and how big the backing data indexes are are governed by the index life cycle management policy applied to the data stream

There is a default ILM policy for filebeat

Also, 8.11 is pretty old. You should consider upgrading at some point

Most likely your /opt/dpkg.log log has been read, recorded in the file registry log.json in the /var/lib/filebeat/ directory.

How create index by using ingest pipelines?
As you started, create the ingest pipeline, point in FB to ES, check data in Kibana.

  1. Update filebeat.yml, add:
- type: filestream # log is obsolete 
  enabled: true
  tags: [dpkg]
  paths:
    - /opt/dpkg.log

setup.template:
  name: "template-test"
  pattern: "template-test*"
  enabled: true

output.elasticsearch:
  enabled: true
  hosts: ["http://es1:9200"]
  index: "test-%{[agent.version]}"# change how you call it
  timeout: 60
  pipeline: "name-test" # change how you call it

output.console: # useful to make sure FB read log, if you use it, disable output.elasticsearch
  pretty: true
  enabled: false

If you not set the index name, your data will go to filebeat-%{agentversion}.

  1. Make the grok or any other processor, should look like this:
PUT _ingest/pipeline/name-test
{
  "description": "Test Pipeline",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [ "%{GREEDYDATA:msg}" ], 
        "pattern_definitions": { "" }
      }
    }
  ]
}

Use Kibana UI if is easier. You can also set only 2-3 fields timestamp and greedy msg just to make sure the grok is parsing. Make the grok pattern outside IPipeline, can be easier.

  1. Make a test index and insert test data
 POST /test/_doc/
{
    "message": "<your line from the log>"
}
  1. Create an index template under Index Management, no need to specify fields, ES will do it for you, just name the template and pattern as you set in filebeat.yml.

  2. Test filebeat.yml

filebeat.exe test config
  1. Run it:
    filebeat.exe -e
  2. If there is an error will be on screen and log. You can set log level to debug if you wish, but enough is -e.
  3. Create a data view in Kibana, should point on test-* index, or how you name it.
  4. Check data in Discovery
  5. Improve your grok to parse data in fields.

Remember, logs are your best friends, and RTFM.

Many thanks for that long reply.
This time I have issue, meaning - filebeat is sending logs:

{"log.level":"warn","@timestamp":"2025-05-24T18:33:58.262Z","log.logger":"elasticsearch","log.origin":{"function":"github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.NewClient.func3","file.name":"elasticsearch/client.go","file.line":175},"message":"Failed to index 1600 events in last 10s: events were dropped! Look at the event log to view the event and cause.","service.name":"filebeat","ecs.version":"1.6.0"}

In the logs I founded only:

"message":"Set settings.index.lifecycle.name in template to filebeat as ILM is enabled.","service.name":"filebeat","ecs.version":"1.6.0"}

But the thing is that I have this configured:

{
  "index": {
    "lifecycle": {
      "name": "filebeat"
    },

I don't really know what is the issue with that.

You should have a directory events in the filebeat logs directory.. those ofs will show the specific error which is likely a mapping error.

That all looks fine to me

If you don't want to use ILM, you can exclude on the root, above setup.template:


setup.ilm.enabled: false

setup.template:
...

Hello,
Thanks for Your answer.
I solved issue by delete template for it and recreate pipeline with other name.

I have a question because I don't understand that. On one time I have created index, and on another implementation of filebeat I have created datastream

I don't know why is that happening

1 Like

I'm not sure we're going to be able to give you a specific answer because it's not really clear what the exact steps you did each time.

This is why I suggest starting with defaults and getting the default behavior and then adjusting.

1 Like

As Stephen point, not clear feedback will not provide useful help.

From Qantas airline repair logs:
Pilot: Aircraft handles funny.
Maintenance engineers: Aircraft warned to straighten up, fly right, and be serious.

1 Like

ok I will try to be more clear.

I have filebeat config for dpkg log:

    output:
      elasticsearch:
        enabled: true
        hosts: ["http://es1:9200"]
        timeout: 60
        index: "dpkg-%{+YYYY.MM}"
        pipeline: "filebeat-dpkg"
        action: create
    #manage_template:
    setup.template:
      name: "filebeat-dpkg"
      pattern: "dpkg-*"
    enabled: true
    setup.ilm.enabled: false
    #index.lifecycle.name: "filebeat"
    filebeat.inputs:
      - type: filestream
        id: "filebeat-dpkg"
        tags: [dpkg]
        enabled: true
        paths:
          - /opt/dpkg.log

I created pipeline for it with below grok pattern:

filebeat-dpkgProcessors [ { "grok": { "field": "message", "patterns": [ "%{GREEDYDATA}" ], "tag": "dpkg" } } ]

removed template - will be created automatically.
and I have "dpkg-2025.05" index in datastream.

I have configuration for sddm.log:

    output:
      elasticsearch:
        enabled: true
        hosts: ["http://es1:9200"]
        timeout: 60
        index: "sddm-%{+YYYY.MM}"
        pipeline: "filebeat-sddm"
        action: create
    setup.template:
      name: "filebeat-sddm"
      pattern: "filebeat-sddm-*"
    enabled: true
    setup.ilm.enabled: false
    #index.lifecycle.name: "93-Day-Retention-Policy"
    filebeat.inputs:
      - type: filestream
        id: "filebeat-sddm"
        tags: [sddm]
        enabled: true
        paths:
          - /opt/sddm2.log

the same - removed indec template, created pipeline as 'filebeat-sddm' for it
and I have index 'sddm-2025.05' not datastream.

I completely cannot figure why one log I have in index and another in datastream.

First do not add the Date to the data stream name that is anti-pattern and does not make sense since the backing indices will have the the date.

should be something like

I believe The reason for your issue is

    output:
      elasticsearch:
        enabled: true
        hosts: ["http://es1:9200"]
        timeout: 60
        index: "dpkg-%{+YYYY.MM}". <<<< THIS HERE 
        pipeline: "filebeat-dpkg"
        action: create
    #manage_template:
    setup.template:
      name: "filebeat-dpkg"
      pattern: "dpkg-*" <<<< MATCHES THIS PATTERN SO TEMPLATE IS APPLIED 

But here

    output:
      elasticsearch:
        enabled: true
        hosts: ["http://es1:9200"]
        timeout: 60
        index: "sddm-%{+YYYY.MM}" <<< THIS HERE 
        pipeline: "filebeat-sddm"
        action: create
    setup.template:
      name: "filebeat-sddm"
      pattern: "filebeat-sddm-*" <<<< DOES NOT MATCHES THIS PATTERN SO TEMPLATE IS ***NOT*** APPLIED AND SO A SIMPLE DEFAULT INDEX IS CREATED  
    enabled: true

Details :slight_smile:

1 Like

Many thanks for that
Sometimes I'm blind on such details.

Ok, corrected my configuration:

    output:
      elasticsearch:
        hosts: ["http://es1:9200"]
        timeout: 60
        index: "dpkg"
        pipeline: "filebeat-dpkg-5"
        #action: create
        #data_stream: false
        #manage_template:
    setup.template:
      name: "filebeat-dpkg-5"
      pattern: "dpkg-*"
    enabled: true
    setup.ilm.enabled: false
        #index.lifecycle.name: "filebeat"
    filebeat.inputs:
      - type: filestream
        #id: "filebeat-dpkg-5"
        tags: [dpkg]
        enabled: true
        paths:
          - /opt/dpkg.log

I don't know why it is creating index to me not datastream.
But funny thing when I will change it to something like "dpkg-prod" it is creating datastream.
It is somewhere remembers this index dpkg but I don't know where.

I removed old pipeline for it, and removed filebeat from ILM because I'm not using it.
I don't know where to look other to not have index but like the rest - datastreams

The index name does not match the pattern....do not match

Try

index: "dpkg-prod"

The pattern is to match the data stream name not the backing indices... So your pattern has a -* but the data stream name you're setting it does not...

Yes it's a little confusing that the file beats still says index but it's really what you're writing to which in this case is a data stream

You might also want to look at this...

Probably the way I would do this to ingest into a logs data stream then add the pipeline

I will try it - many thanks for that, indeed my filebeat.inputs is not completed as this example

1 Like