Why logstash csv filter skip_header param don't work

Using the Logstash CSV filter to parse my CSV file, I set the skip_header parameter to true. I wanted the first line not to be printed, but the actual result program didn't skip the first line。
here is my config:

input {
  file {
    path => ["/usr/local/air_logs/*"]
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter {
  csv {
     skip_header => true
     columns => ["site","parameter","date","year","month","day","hour","value","unit","duration","name"]
     convert => {
         "value" => "integer"
     }
  }
  date {
     match => ["date","yyyy-MM-dd HH:mm", "M/d/yyyy H:mm"]
     target => "date"
  }
}
output {
  stdout {
    codec => "rubydebug"
  }
}

here is my console print:

{
      "@version" => "1",
          "site" => "Site",
    "@timestamp" => 2022-05-04T03:24:46.819Z,
         "month" => "Month",
          "hour" => "Hour",
         "value" => "Value",
          "date" => "Date (LST)",
     "parameter" => "Parameter",
           "day" => "Day",
          "path" => "/usr/local/air_logs/Beijing_2008_HourlyPM2.5_created20140325.csv",
      "duration" => "Duration",
          "host" => "0.0.0.0",
          "unit" => "Unit",
          "name" => "QC Name",
          "year" => "Year",
       "message" => "Site,Parameter,Date (LST),Year,Month,Day,Hour,Value,Unit,Duration,QC Name\r",
          "tags" => [
        [0] "_dateparsefailure"
    ]
}
{
      "@version" => "1",
          "site" => "Beijing",
    "@timestamp" => 2022-05-04T03:24:46.877Z,
         "month" => "4",
          "hour" => "15",
         "value" => 207,
          "date" => 2008-04-08T07:00:00.000Z,
     "parameter" => "PM2.5",
           "day" => "8",
          "path" => "/usr/local/air_logs/Beijing_2008_HourlyPM2.5_created20140325.csv",
      "duration" => "1 Hr",
          "host" => "0.0.0.0",
          "unit" => "µg/mg³",
          "name" => "Valid",
          "year" => "2008",
       "message" => "Beijing,PM2.5,2008-04-08 15:00,2008,4,8,15,207,µg/mg³,1 Hr,Valid\r"
}

here is my csv file:

Site,Parameter,Date (LST),Year,Month,Day,Hour,Value,Unit,Duration,QC Name
Beijing,PM2.5,2008-04-08 15:00,2008,4,8,15,207,µg/mg³,1 Hr,Valid
Beijing,PM2.5,2008-04-08 16:00,2008,4,8,16,180,µg/mg³,1 Hr,Valid

very worry

Hi @zhanghao116560 Welcome the community.

That is not how skip_header functions per the docs here

It does not just skip the first line... It skips the line equal to the column names.

If skip_header is set without autodetect_column_names being set then columns should be set which will result in the skipping of any row that exactly matches the specified column values.

Since your column names do not exactly match that first row. It is not dropped. It makes sense that you want to rename the columns. That's fine.

But you can drop it pretty simply.
With something like this after the csv.

if [site] == "Site" {
        drop { }
      }

Perhaps someone else will have another suggestion

I see. Thank you very much

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.