Logstash: MalformedCSVError

Hi,

can anybody help me to find the right mutate => gsub definition to avoid these warnings / errors:

[WARN ] 2023-08-08 07:05:15.003 [[main]>worker20] csv - Error parsing csv {:field=>"message", :source=>"16600,26200,\" 2004 hat Snow Patrol \\\"Run\\\" rausgebracht und nur vier Jahre später hat diesen Song\"", :exception=>#<CSV::MalformedCSVError: Any value after quoted field isn't allowed in line 1.>}

I´ve tried:

  mutate {
    gsub => ["message","\"","'"]

but it doesn´t work. Thanks!

Welcome to the community!

I have assumed this is your message:

You can use this:

 	mutate{ gsub => [ "message", '\"', "'" ] }
 	mutate{ gsub => [ "message", "[\\]", '' ] }

To get:
"message" => "16600,26200,' 2004 hat Snow Patrol 'Run' rausgebracht und nur vier Jahre später hat diesen Song'"

Thanks, so I need both?

Yes, because without 2nd gsub, you will get:
message" => "16600,26200,\\' 2004 hat Snow Patrol \\\\\\'Run\\\\\\' rausgebracht und nur vier Jahre später hat diesen Song\\'"

Thanks, so this is my config:

input {
  file {
    path => "/home/ai-upload/vtt/*.csv" 
    start_position => "beginning"
    sincedb_path => "/home/ai-upload/sincedb" 
  }
}

filter {
  csv {
    separator => ","
    columns => ["start", "end", "text", "Timestamp"]
  }
  mutate{ gsub => [ "message", '\"', "'" ] }
  mutate{ gsub => [ "message", "[\\]", '' ] }
  mutate { remove_field => ["path", "host", "@timestamp", "@version"] }
}

output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "moderationen"
  }
}

I´ve tested it, but the errors persist:

[WARN ] 2023-08-08 08:23:19.087 [[main]>worker31] csv - Error parsing csv {:field=>"message", :source=>"15000,19000,\" The Boss Hoss kennt ihr schon von ihrem letzten Megahit, nämlich \\\"Little Help\\\".\"", :exception=>#<CSV::MalformedCSVError: Any value after quoted field isn't allowed in line 1.>}

Try like this, first replace:

filter {

  mutate{ gsub => [ "message", '\"', "'" ] }
  mutate{ gsub => [ "message", "[\\]", '' ] }

  csv {
    separator => ","
    columns => ["start", "end", "text", "Timestamp"]
  }

Thanks, this works, but now the columns are splitted different. It seems that the mutation generates a new line seperator. Can I avoid this?

You have 3 columns:
columns => ["start", "end", "text"]

I don't see the Timestamp column in samples.

I´ve checked that. Now the script includes the column Timestamp. I think it is the semicolon within the column text - e.g.:

53760,59640," Wenn ihr wisst, wo es staut oder wo vielleicht geblitzt wird, ruft an 07 32 3 mal die 7 3",Wed 10 May 2023 04:56:33 AM CEST

Is there another mutation necessary?

In that case you would use

    mutate{ gsub => [ "message", "'", '"' ] }
    csv {}

If you also have lines like the first one you shared you may need to make the mutates conditional.