[Multiline]Data following newline is ignored

Hi,

I have a csv file that contains a double quoted field with a empty line. I am trying to convert this to a single line using the multiline codec so I can filter it later on using the csv filter.

csv example

number,Alarm,Date,Message,Details,group,id
100,Warning,2019/12/01,some message. (something),"something, do this;

  1. do something.
  2. something [something] something ""something"" something.
  3. Do [something] for [something] something.

Status: something.",groupname,12345678

But for some reason everything after 3. is ignored and does not show up in the stdout. For testing purposes I removed the double quoted ""something"".

config

 input {
  file {
   path => "Alarm.csv"
   sincedb_path => "/dev/null"
   start_position => "beginning"
   type => "me_alerts"
   codec => multiline {
     pattern => "\""
     negate => true
     what => previous
   }
 }
}

stdout

"message" => "100,Warning,2019/12/01,some message. (something),\"something, do this;\n 1. do something.\n 2. something [something] something something something.\n 3. Do [something] for [something] something.",

Everything after the empty line in the csv source field is dropped. Is there a way to solve this?

I do not think so. The multiline filter is configured to say "if the line does not contain a double quote, then append it to the previous line". That results in the closing " being in a separate event. You would need a stateful approach that says "if I have seen an opening double quote then the next double quote that is not escaped with a second double quote is a closing double quote".

A codec that could handle that would be useful. There is a csv codec but I do not know if it handles this correctly.

The cvs filter doesn't handle the double quoted fields (exception errors) correctly hence I tried using the multiline codec.

But what I don't understand is why the multiline codec doesn't process the part after the new line. That line doesn't contain a double quote so why is it stopping there?

By the way, thinking about it, if I have multiple rows in the original csv, won't the multiline codec all stuff that together, creating problems further down the line if I where to apply the csv filter?

Maybe I'd be better off trying to make a script to "fix" the double quoted fields before ingesting the data with Logstash.

I doubt it.

If someone held a gun to my head and told me to fix it I would look at writing a codec that could parse nested double quoted strings.

I am hoping nobody holds a gun to my head. It's hard, but not outrageous.

If I only I knew how to program hehe.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.