Csv filter plugin: how to deal with masked quotations?

asp · August 23, 2019, 10:10am

Hi,

I have following csv file.

"@timestamp";"col_a";"col_b";"col_c";"col_d"
"2019-08-23 04:43:16.821";"<?xml version=\"1.0\" encoding=\"UTF-8\"?>";b;c;d

I am trying to parse it in logstash with following filter:

filter
{
  csv
  {
    autodetect_column_names => true
    autogenerate_column_names => true
    separator => ";"
    source => "message"
    skip_empty_columns => "true"
    target=> "mycsv"
  }
}

2nd line is throwing following error:

"2019-08-23 04:43:16.821";"<?xml version=\"1.0\" encoding=\"UTF-8\"?>";b;c;d
[2019-08-23T12:02:38,204][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"\"2019-08-23 04:43:16.821\";\"<?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>\";b;c;d\r", :exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1>}
[2019-08-23T12:02:38,208][INFO ][logstash.outputs.file    ] Opening file {:path=>"c:/work/elastic/input/myoutput.json"}
[2019-08-23T12:02:38,216][INFO ][logstash.outputs.file    ] Opening file {:path=>"c:/work/elastic/input/myoutput.log"}
{
    "@timestamp" => 2019-08-23T10:02:38.099Z,
          "host" => "dtpbl0319",
      "@version" => "1",
       "message" => "\"2019-08-23 04:43:16.821\";\"<?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>\";b;c;d\r",
          "tags" => [
        [0] "_csvparsefailure"
    ]
}

Changing the quote_char is not an option to me. Input is coming from logs where the code can contain single or double quotes.

Is there a way csv filter can deal with masked quote_chars, so that it handles them as normal text?

Background: I am exporting results of queries via kibana's csv export from customer. Now I am trying to import the results back to an independant elasticsearch instance for our developers.

Thanks, Andreas

asp · August 23, 2019, 10:38am

OK, sorry, I found the issue here:

kibana csv output is masking the doublequotes " correctly as""
That is accepted by csv filter plugin.

I changed it to " during debugging. - My fault.

The main cause of my issue was, that kibanas csv export is not masking newlines with \n. So one csv row may consist of many lines if you have multiline values. But csv input plugin does not like it. Thought first it was an issue with the quotes.

Badger · August 23, 2019, 2:05pm

If the data really does look like that then indeed the csv filter will choke on it. I would work around it using

 mutate { gsub => [ "message", '[\\]"', "!!BackslashDoubleQuote!!" ] }

Then gsub back after the csv.

system · September 20, 2019, 2:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CSV Filter - Quote character parse failure Logstash	2	3489	February 20, 2018
Logstash failing for csv load with double quotes Logstash	2	858	December 30, 2019
Error parsing csv Logstash	4	3751	July 3, 2017
CSV Illegal Quoting in Line 1 Logstash	14	1480	January 5, 2022
CSV filter - How can I parse unquoted CSV? Logstash	8	716	March 15, 2019

Csv filter plugin: how to deal with masked quotations?

Related topics