CSV filter - How can I parse unquoted CSV?

Skeeve · February 15, 2019, 11:01am

My Data looks similar to this:

2019-02-15T10:56:00.000+000\t{"test":"value","more":{"property":"value"}}

but with a lot more fields. Issue here is that one field is a JSON structure which is unquoted. When I do not set "quote_char" I get parse errors because there is "Illegal quoting in line 1.".

When I set "quote_char" to "" I get an errror because ":quote_char has to be a single character String".

So what's the correct way to parse CSV which does not have quotes?

Skeeve · February 15, 2019, 12:47pm

My current approach is to use the "dissect" plugin:

filter {
  dissect {
    mapping => {
      message => "%{scan_id}	%{result_id}	%{created_scan}	%{created_at}	%{updated_at}	%{updated_scan}	%{scanner_type}	%{duration}	%{recurrentscan}	%{freescan}	%{dangerlevel}	%{has_error}	%{url}	%{complete_request}"
    }
  }
  mutate {
    remove_field => ["message"]
    strip => ["complete_request"]
    convert => {
      "scan_id"          => "integer"
      "result_id"        => "integer"
      "scanner_type"     => "string"
      "duration"         => "integer"
      "recurrentscan"    => "boolean"
      "freescan"         => "boolean"
      "dangerlevel"      => "integer"
      "has_error"        => "boolean"
      "url"              => "string"
    }
  }
}

Unfortunately there seems to be no way here to tell logstash that 4 fields are date_time, which I could do with the CSV plugin.

So if there is anyone having an idea how to resolve this, feel free to add…

Badger · February 15, 2019, 2:50pm

How do you do that using the csv filter?

Skeeve · February 15, 2019, 2:55pm

Does this help? Csv filter plugin | Logstash Reference [8.11] | Elastic

Badger · February 15, 2019, 3:00pm

You could use a date filter, or you could use mutate+add_field to add a new field that comprises the four fields of interest, then use a csv field to chop that up and convert them.

Skeeve · February 15, 2019, 6:47pm

Sounds a bit overcomplicated. I still wonder why it's not possible to parse unquoted CSV.

Skeeve · February 15, 2019, 9:42pm

After digging a bit deeper I found that the CSV filter is based on ruby's CSV class which in turn follows RFC 4180 which states that "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." -> So here was my wrong assumption. MySQL (which generates my data) does not quote fields at all.

So the only way the CSV filter might be able to parse MySQL's CSV is to set

:liberal_parsing

to true. Is there any way I could do that?

Oh… And that's only available since CSV 2.4.0 (feature #11839)

Badger · February 15, 2019, 10:05pm

No, the filter does not support that.

system · March 15, 2019, 10:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Csv filter plugin: how to deal with masked quotations? Logstash	3	1683	September 20, 2019
Logstash csv parser with single and double quotes Logstash	1	41	September 11, 2024
CSV Filter - Quote character causing _csvparsefailure Logstash	6	7723	March 26, 2018
Logstash failing for csv load with double quotes Logstash	2	858	December 30, 2019
CSV Filter - Quote character parse failure Logstash	2	3489	February 20, 2018

CSV filter - How can I parse unquoted CSV?

Related topics