Csv filter plugin: how to deal with masked quotations?

Hi,

I have following csv file.

"@timestamp";"col_a";"col_b";"col_c";"col_d"
"2019-08-23 04:43:16.821";"<?xml version=\"1.0\" encoding=\"UTF-8\"?>";b;c;d

I am trying to parse it in logstash with following filter:

filter
{
  csv
  {
    autodetect_column_names => true
    autogenerate_column_names => true
    separator => ";"
    source => "message"
    skip_empty_columns => "true"
    target=> "mycsv"
  }
}

2nd line is throwing following error:

"2019-08-23 04:43:16.821";"<?xml version=\"1.0\" encoding=\"UTF-8\"?>";b;c;d
[2019-08-23T12:02:38,204][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"\"2019-08-23 04:43:16.821\";\"<?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>\";b;c;d\r", :exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1>}
[2019-08-23T12:02:38,208][INFO ][logstash.outputs.file    ] Opening file {:path=>"c:/work/elastic/input/myoutput.json"}
[2019-08-23T12:02:38,216][INFO ][logstash.outputs.file    ] Opening file {:path=>"c:/work/elastic/input/myoutput.log"}
{
    "@timestamp" => 2019-08-23T10:02:38.099Z,
          "host" => "dtpbl0319",
      "@version" => "1",
       "message" => "\"2019-08-23 04:43:16.821\";\"<?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>\";b;c;d\r",
          "tags" => [
        [0] "_csvparsefailure"
    ]
}

Changing the quote_char is not an option to me. Input is coming from logs where the code can contain single or double quotes.

Is there a way csv filter can deal with masked quote_chars, so that it handles them as normal text?

Background: I am exporting results of queries via kibana's csv export from customer. Now I am trying to import the results back to an independant elasticsearch instance for our developers.

Thanks, Andreas

OK, sorry, I found the issue here:

kibana csv output is masking the doublequotes " correctly as""
That is accepted by csv filter plugin.

I changed it to " during debugging. - My fault.

The main cause of my issue was, that kibanas csv export is not masking newlines with \n. So one csv row may consist of many lines if you have multiline values. But csv input plugin does not like it. Thought first it was an issue with the quotes.

If the data really does look like that then indeed the csv filter will choke on it. I would work around it using

 mutate { gsub => [ "message", '[\\]"', "!!BackslashDoubleQuote!!" ] }

Then gsub back after the csv.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.