I have a csv file that contains a double quoted field with a empty line. I am trying to convert this to a single line using the multiline codec so I can filter it later on using the csv filter.
csv example
number,Alarm,Date,Message,Details,group,id
100,Warning,2019/12/01,some message. (something),"something, do this;
I do not think so. The multiline filter is configured to say "if the line does not contain a double quote, then append it to the previous line". That results in the closing " being in a separate event. You would need a stateful approach that says "if I have seen an opening double quote then the next double quote that is not escaped with a second double quote is a closing double quote".
A codec that could handle that would be useful. There is a csv codec but I do not know if it handles this correctly.
The cvs filter doesn't handle the double quoted fields (exception errors) correctly hence I tried using the multiline codec.
But what I don't understand is why the multiline codec doesn't process the part after the new line. That line doesn't contain a double quote so why is it stopping there?
By the way, thinking about it, if I have multiple rows in the original csv, won't the multiline codec all stuff that together, creating problems further down the line if I where to apply the csv filter?
Maybe I'd be better off trying to make a script to "fix" the double quoted fields before ingesting the data with Logstash.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.