[Multiline]Data following newline is ignored

Sjaak01 · August 5, 2019, 6:32am

Hi,

I have a csv file that contains a double quoted field with a empty line. I am trying to convert this to a single line using the multiline codec so I can filter it later on using the csv filter.

csv example

number,Alarm,Date,Message,Details,group,id
100,Warning,2019/12/01,some message. (something),"something, do this;

do something.

something [something] something ""something"" something.

Do [something] for [something] something.

Status: something.",groupname,12345678

But for some reason everything after 3. is ignored and does not show up in the stdout. For testing purposes I removed the double quoted ""something"".

config

 input {
  file {
   path => "Alarm.csv"
   sincedb_path => "/dev/null"
   start_position => "beginning"
   type => "me_alerts"
   codec => multiline {
     pattern => "\""
     negate => true
     what => previous
   }
 }
}

stdout

"message" => "100,Warning,2019/12/01,some message. (something),\"something, do this;\n 1. do something.\n 2. something [something] something something something.\n 3. Do [something] for [something] something.",

Everything after the empty line in the csv source field is dropped. Is there a way to solve this?

Badger · August 5, 2019, 12:36pm

I do not think so. The multiline filter is configured to say "if the line does not contain a double quote, then append it to the previous line". That results in the closing " being in a separate event. You would need a stateful approach that says "if I have seen an opening double quote then the next double quote that is not escaped with a second double quote is a closing double quote".

A codec that could handle that would be useful. There is a csv codec but I do not know if it handles this correctly.

Sjaak01 · August 5, 2019, 11:38pm

The cvs filter doesn't handle the double quoted fields (exception errors) correctly hence I tried using the multiline codec.

But what I don't understand is why the multiline codec doesn't process the part after the new line. That line doesn't contain a double quote so why is it stopping there?

By the way, thinking about it, if I have multiple rows in the original csv, won't the multiline codec all stuff that together, creating problems further down the line if I where to apply the csv filter?

Maybe I'd be better off trying to make a script to "fix" the double quoted fields before ingesting the data with Logstash.

Badger · August 6, 2019, 12:31am

I doubt it.

If someone held a gun to my head and told me to fix it I would look at writing a codec that could parse nested double quoted strings.

I am hoping nobody holds a gun to my head. It's hard, but not outrageous.

Sjaak01 · August 6, 2019, 1:33am

If I only I knew how to program hehe.

system · September 3, 2019, 1:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiline codec and csv filter Logstash	1	1478	August 3, 2017
Ignoring Newline (\n) within double quotes for CSV Filter Logstash	2	1457	July 6, 2017
Multiline codec issue Logstash	1	1059	July 6, 2017
Convert multiline csv data to a single line csv in filter Logstash	7	1227	August 9, 2020
Multi line CSV Log Logstash	3	1646	April 13, 2018

[Multiline]Data following newline is ignored

Related topics