but with a lot more fields. Issue here is that one field is a JSON structure which is unquoted. When I do not set "quote_char" I get parse errors because there is "Illegal quoting in line 1.".
When I set "quote_char" to "" I get an errror because ":quote_char has to be a single character String".
So what's the correct way to parse CSV which does not have quotes?
You could use a date filter, or you could use mutate+add_field to add a new field that comprises the four fields of interest, then use a csv field to chop that up and convert them.
After digging a bit deeper I found that the CSV filter is based on ruby's CSV class which in turn follows RFC 4180 which states that "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." -> So here was my wrong assumption. MySQL (which generates my data) does not quote fields at all.
So the only way the CSV filter might be able to parse MySQL's CSV is to set
:liberal_parsing
to true. Is there any way I could do that?
Oh… And that's only available since CSV 2.4.0 (feature #11839)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.