Why logstash csv filter skip_header param don't work

zhanghao116560 · May 4, 2022, 3:44am

Using the Logstash CSV filter to parse my CSV file, I set the skip_header parameter to true. I wanted the first line not to be printed, but the actual result program didn't skip the first line。
here is my config:

input {
  file {
    path => ["/usr/local/air_logs/*"]
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter {
  csv {
     skip_header => true
     columns => ["site","parameter","date","year","month","day","hour","value","unit","duration","name"]
     convert => {
         "value" => "integer"
     }
  }
  date {
     match => ["date","yyyy-MM-dd HH:mm", "M/d/yyyy H:mm"]
     target => "date"
  }
}
output {
  stdout {
    codec => "rubydebug"
  }
}

here is my console print:

{
      "@version" => "1",
          "site" => "Site",
    "@timestamp" => 2022-05-04T03:24:46.819Z,
         "month" => "Month",
          "hour" => "Hour",
         "value" => "Value",
          "date" => "Date (LST)",
     "parameter" => "Parameter",
           "day" => "Day",
          "path" => "/usr/local/air_logs/Beijing_2008_HourlyPM2.5_created20140325.csv",
      "duration" => "Duration",
          "host" => "0.0.0.0",
          "unit" => "Unit",
          "name" => "QC Name",
          "year" => "Year",
       "message" => "Site,Parameter,Date (LST),Year,Month,Day,Hour,Value,Unit,Duration,QC Name\r",
          "tags" => [
        [0] "_dateparsefailure"
    ]
}
{
      "@version" => "1",
          "site" => "Beijing",
    "@timestamp" => 2022-05-04T03:24:46.877Z,
         "month" => "4",
          "hour" => "15",
         "value" => 207,
          "date" => 2008-04-08T07:00:00.000Z,
     "parameter" => "PM2.5",
           "day" => "8",
          "path" => "/usr/local/air_logs/Beijing_2008_HourlyPM2.5_created20140325.csv",
      "duration" => "1 Hr",
          "host" => "0.0.0.0",
          "unit" => "µg/mg³",
          "name" => "Valid",
          "year" => "2008",
       "message" => "Beijing,PM2.5,2008-04-08 15:00,2008,4,8,15,207,µg/mg³,1 Hr,Valid\r"
}

here is my csv file:

Site,Parameter,Date (LST),Year,Month,Day,Hour,Value,Unit,Duration,QC Name
Beijing,PM2.5,2008-04-08 15:00,2008,4,8,15,207,µg/mg³,1 Hr,Valid
Beijing,PM2.5,2008-04-08 16:00,2008,4,8,16,180,µg/mg³,1 Hr,Valid

very worry

stephenb · May 4, 2022, 4:14am

Hi @zhanghao116560 Welcome the community.

That is not how skip_header functions per the docs here

It does not just skip the first line... It skips the line equal to the column names.

If skip_header is set without autodetect_column_names being set then columns should be set which will result in the skipping of any row that exactly matches the specified column values.

Since your column names do not exactly match that first row. It is not dropped. It makes sense that you want to rename the columns. That's fine.

But you can drop it pretty simply.
With something like this after the csv.

if [site] == "Site" {
        drop { }
      }

Perhaps someone else will have another suggestion

zhanghao116560 · May 4, 2022, 6:18am

I see. Thank you very much

system · June 1, 2022, 6:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Skip header line in CSV input (v 1.5.0) Logstash	8	18868	July 6, 2017
Csv header skip_header => "true" not working Logstash	3	916	June 18, 2021
Can we use " :: " as a separator in csv filter in logstash Logstash	4	983	September 29, 2020
Unknown setting 'skip_header' for csv Logstash	3	796	August 29, 2019
CSV filter not working properly Logstash	3	564	March 25, 2022

Why logstash csv filter skip_header param don't work

Related topics