Logstash csv generates an error when \n is in quoted field

The csv plugin for logstash 5.6.3 is generating the following error:

logstash.filters.csv MalformedCSVError: Unclosed quoted field

      "tags" => [
    [0] "_csvparsefailure"

The input data are:

field1, field2
aaa, "AB1\nCD2\nEF3\n"

in logstash rubydebug:

 "message" => " aaa, \"AB1\nCD2\nEF3\n\"",

Ugly Fix:
I had to resort to this ugly fix in logstash config file to keep things running:

filter {
         # at the beginning of the filter section
         mutate {        gsub => [ "message", "\n", "\x00"] }
         csv {         columns => [ "field1", "field2" ]
                          separator => ","
         }
        ....
        # at the end of the filter section
        mutate {        gsub => [ "field2", "\x00", "\n"] }
}

Result of Ugly Fix:
in kibana I get the value of field2:

"AB1\x00CD2\x00EF3\x00"

I am looking for a better solution.
Thanks

1 Like

The CSV parser does not like the space after the separator.

 aaa, "AB1\nCD2\nEF3\n"
     ^ space here

If all separators are followed by a space, use separator => ", " i.e. make the separator into comma space.
If not then use mutate gsub before with mutate { gsub => [ "message", ", \"", ",\""] }

Hope this helps

Sorry, this was a typo in my question.
The actual input is
aaa,"AB1\nCD2\nEF3\n"
An is also correct in the logstash rubydebug.
"message" => "aaa,\"AB1\nCD2\nEF3\n\"",
There is no issue with the separator.

The issue is with \n in a quoted column.
I used the mutiline codec in the Logstash input file section to read the next line as a continuation of the previous:

# This says that any line not starting with a timestamp should be merged with the previous line.
codec => multiline {
  # Grok pattern names are valid! :)
  pattern => "^\d{4}[-]\d{2}[-]\d{2}"
  negate => true
  what => "previous"
  auto_flush_interval => 5
}

Now I have the \n inside the double quotes.
CSV does not like it.

If I use the Ruby REPL irb and mimic what the csv filter does, I do not get an Error.
csv filter code:

values = CSV.parse_line(source, :col_sep => @separator, :quote_char => @quote_char)

REPL:

source = "aaa,\"bbb\nccc\nddd\""
=> "aaa,\"bbb\nccc\nddd\""
values = CSV.parse_line(source, :col_sep => ',', :quote_char => '"')
=> ["aaa", "bbb\nccc\nddd"]

I can't see why the CSV is failing, however, try a simple gsub switch using an improbable character sequence:

filter {
         # at the beginning of the filter section
         mutate {        gsub => [ "message", "\n", "^±|±^"] }
         csv {         columns => [ "field1", "field2" ]
                          separator => ","
         }
        ....
        # at the end of the filter section
        mutate {        gsub => [ "field2", "^±|±^", "\n"] }
}

To keep moving I followed your advice about using gsub.
gsub matches a regular expression against a field value and replace all matches with a replacement string.

However,
the pattern you suggested
"^±|±^" has two regular expression control characters ^ |

It does not work at the end
gsub => [ "field2", "^±|±^", "\n"]

Nevertheless, I used an improbable pattern with no control characters.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.