Logstash: MalformedCSVError

Hi,

can anybody help me to find the right mutate => gsub definition to avoid these warnings / errors:

[WARN ] 2023-08-08 07:05:15.003 [[main]>worker20] csv - Error parsing csv {:field=>"message", :source=>"16600,26200,\" 2004 hat Snow Patrol \\\"Run\\\" rausgebracht und nur vier Jahre später hat diesen Song\"", :exception=>#<CSV::MalformedCSVError: Any value after quoted field isn't allowed in line 1.>}

I´ve tried:

  mutate {
    gsub => ["message","\"","'"]

but it doesn´t work. Thanks!

Welcome to the community!

I have assumed this is your message:

You can use this:

 	mutate{ gsub => [ "message", '\"', "'" ] }
 	mutate{ gsub => [ "message", "[\\]", '' ] }

To get:
"message" => "16600,26200,' 2004 hat Snow Patrol 'Run' rausgebracht und nur vier Jahre später hat diesen Song'"

Thanks, so I need both?

Yes, because without 2nd gsub, you will get:
message" => "16600,26200,\\' 2004 hat Snow Patrol \\\\\\'Run\\\\\\' rausgebracht und nur vier Jahre später hat diesen Song\\'"

Thanks, so this is my config:

input {
  file {
    path => "/home/ai-upload/vtt/*.csv" 
    start_position => "beginning"
    sincedb_path => "/home/ai-upload/sincedb" 
  }
}

filter {
  csv {
    separator => ","
    columns => ["start", "end", "text", "Timestamp"]
  }
  mutate{ gsub => [ "message", '\"', "'" ] }
  mutate{ gsub => [ "message", "[\\]", '' ] }
  mutate { remove_field => ["path", "host", "@timestamp", "@version"] }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "moderationen"
  }
}

I´ve tested it, but the errors persist:

[WARN ] 2023-08-08 08:23:19.087 [[main]>worker31] csv - Error parsing csv {:field=>"message", :source=>"15000,19000,\" The Boss Hoss kennt ihr schon von ihrem letzten Megahit, nämlich \\\"Little Help\\\".\"", :exception=>#<CSV::MalformedCSVError: Any value after quoted field isn't allowed in line 1.>}

Try like this, first replace:

filter {

  mutate{ gsub => [ "message", '\"', "'" ] }
  mutate{ gsub => [ "message", "[\\]", '' ] }

  csv {
    separator => ","
    columns => ["start", "end", "text", "Timestamp"]
  }

Thanks, this works, but now the columns are splitted different. It seems that the mutation generates a new line seperator. Can I avoid this?

You have 3 columns:
columns => ["start", "end", "text"]

I don't see the Timestamp column in samples.

I´ve checked that. Now the script includes the column Timestamp. I think it is the semicolon within the column text - e.g.:

53760,59640," Wenn ihr wisst, wo es staut oder wo vielleicht geblitzt wird, ruft an 07 32 3 mal die 7 3",Wed 10 May 2023 04:56:33 AM CEST

Is there another mutation necessary?

In that case you would use

    mutate{ gsub => [ "message", "'", '"' ] }
    csv {}

If you also have lines like the first one you shared you may need to make the mutates conditional.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.