Send data to right pipeline from two CSV files

mladen · April 4, 2018, 9:05am

Hello,

I am trying to understand workflow Filebeat ---> Beats/Logstash ---> Right pipeline.

For example, we can have two CSVfiles with the same structure. Both data stream is sending to the Beats who listen on port 5044. My question is how Logstash knows to witch pipeline he should forward the data. Sorry if I miss something in documentation.

BR,
Mladen

mladen · April 4, 2018, 3:29pm

I’ll try to be more specific.

After adding second source (/iib/syslogmqsi/iib.log) in filebeat.yml I get an error in my Logstash log. This is part of my filebeat.yml:

> - type: log
> 
>   # Change to true to enable this prospector configuration.
>   enabled: true
> 
>   # Paths that should be crawled and fetched. Glob based paths.
>   paths:
>     - /logs/monitoring/EG_monitoring_kaiibraz.log
>     - /iib/syslogmqsi/iib.log

I have two pipelines, one is CSV and another use grok filter.

This is my csv pipeline:

input {
    beats {
        port => "5044"
    }
}
# The filter part of this file is commented out to indicate that it is
# optional.
filter {
    csv {
        columns => [ "date_time", "cpu_utilization", "ram_utilization", "execution_group" ]
        separator => ","
    }
    mutate {convert => ["cpu_utilization", "float"] }
    mutate {convert => ["ram_utilization", "float"] }
    date {
        locale => "en"
        match => ["date_time", "dd-MM-yy;HH:mm:ss"]
        timezone => "Europe/Vienna"
        target => "@timestamp"
    }
}
output {
    elasticsearch {
        hosts => [ "localhost:9200" ]
        index => "kaiibraz"
    }
}

And grok-filter:

input {
    beats {
        port => "5044"
    }
}
filter {
    grok {
    match => { "message" => "^%{SYSLOGTIMESTAMP:DATE_TIME} %{HOSTNAME:HOSTNAME} %{WORD:SYSTEM}\[%{BASE10NUM:PID}]: IBM Integration Bus %{WORD} \(%{WORD:NODE}.%{WORD:EG}\) \[%{WORD} %{BASE10NUM}] \(%{WORD} %{NOTSPACE}\) %{WORD:CODE}: %{GREEDYDATA:MESSAGE}$" }
    }
    date {
        locale => "en"
        match => ["DATE_TIME", "MMM dd HH:mm:ss"]
        timezone => "Europe/Belgrade"
        target => "@timestamp"
    }
}
output {
    if "_grokparsefailure" in [tags] {
        # write events that didn't match to a file
        file { "path" => "/grok/kaiibraz/grok_log_filter_failures_kaiibraz.txt" }
    }
    else {
        elasticsearch {
        hosts => [ "localhost:9200" ]
        index => "kaiibraz_log"
        }
    }
}

I release that something is wrong when I find data from CSV file in /grok/kaiibraz/grok_log_filter_failures_kaiibraz.txt

BR,
Mladen

Badger · April 4, 2018, 4:38pm

Unless you are using pipelines.yml the configuration files are concatenated, events from each input is sent through every filter and written to every output. You could do something like this, with a different magicvalue in each file.

input {
  beats {
    port => "5044"
    add_field => { "[@metadata][somefield]" => "magicvalue" }
  }
}
filter {
  if "[@metadata][somefield]" == "magicvalue" {
    ...
  }
}
output {
  if "[@metadata][somefield]" == "magicvalue" {
    ...
  }
}

mladen · April 5, 2018, 2:41pm

Thank you @Badger for point me to the right direction. I didn't have a clue what is happening in background . From this post I released that every data stream will be processed with every filter in all pipelines. If we want to control the flow we need to use conditionals. But what is happening if we have for example 50 servers (50 filebeat-s agent) ? What is the best practice in this situation?

Reading articles I found that problem with multiple log files can solve using multiple instances of filebeat. Post is form here. Could someone explain to me how this solves the problem?

Badger · April 5, 2018, 5:28pm

Every prospector in a filebeat instance writes data to the same output. You can add a field to each prospector to identify the type of data (for example, I have a filebeat that collects J9GClog, G1GClog, and apacheaccess). Then in a single pipeline you can have filters that conditionally process each type of log.

If you use multiple instances of filebeat, then each one can write to a different output. So, for example, I could send J9GClog to localhost:5044, G1GClog to localhost:5045, and apacheaccess to localhost:5046. Then I would run three pipelines, each with a beat input on a different port, each processing a single type of log with no conditionals.

Another option is to use a single beat, then use an extra pipeline to do the routing. This might be a big if-else if-else if-else if to route to (for example) tcp inputs on different ports. Or use a translate filter to do the mapping. There are many other ways to dress it up, but the if-else is always there.

mladen · April 5, 2018, 8:42pm

Thanks a lot, for your comment and examples. I now have a better understanding of what is happening during the whole process .

system · May 3, 2018, 8:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat to Logatsh, csv header d'ont send in first Logstash	5	974	December 29, 2020
Multiple Filebeats from multiple sources to multiple pipelines? Beats	1	1134	July 29, 2020
Multiple pipeline in logstash listening to different beats input ports Logstash	1	1694	January 2, 2019
Filebeat : Send different logs from filebeat to different logstash Pipeline Beats filebeat	1	309	April 29, 2019
Shipping CSV file using Filebeat to Logstash and then from logstash to Elasticsearch Logstash	3	8182	July 6, 2017

Send data to right pipeline from two CSV files

Related topics