CSV filter - How can I parse unquoted CSV?


#1

My Data looks similar to this:

2019-02-15T10:56:00.000+000\t{"test":"value","more":{"property":"value"}}

but with a lot more fields. Issue here is that one field is a JSON structure which is unquoted. When I do not set "quote_char" I get parse errors because there is "Illegal quoting in line 1.".

When I set "quote_char" to "" I get an errror because ":quote_char has to be a single character String".

So what's the correct way to parse CSV which does not have quotes?


#2

My current approach is to use the "dissect" plugin:

filter {
  dissect {
    mapping => {
      message => "%{scan_id}	%{result_id}	%{created_scan}	%{created_at}	%{updated_at}	%{updated_scan}	%{scanner_type}	%{duration}	%{recurrentscan}	%{freescan}	%{dangerlevel}	%{has_error}	%{url}	%{complete_request}"
    }
  }
  mutate {
    remove_field => ["message"]
    strip => ["complete_request"]
    convert => {
      "scan_id"          => "integer"
      "result_id"        => "integer"
      "scanner_type"     => "string"
      "duration"         => "integer"
      "recurrentscan"    => "boolean"
      "freescan"         => "boolean"
      "dangerlevel"      => "integer"
      "has_error"        => "boolean"
      "url"              => "string"
    }
  }
}

Unfortunately there seems to be no way here to tell logstash that 4 fields are date_time, which I could do with the CSV plugin.

So if there is anyone having an idea how to resolve this, feel free to add…


#3

How do you do that using the csv filter?


#4

Does this help? https://www.elastic.co/guide/en/logstash/current/plugins-filters-csv.html#plugins-filters-csv-convert


#5

You could use a date filter, or you could use mutate+add_field to add a new field that comprises the four fields of interest, then use a csv field to chop that up and convert them.


#6

Sounds a bit overcomplicated. I still wonder why it's not possible to parse unquoted CSV.


#7

After digging a bit deeper I found that the CSV filter is based on ruby's CSV class which in turn follows RFC 4180 which states that "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields." -> So here was my wrong assumption. MySQL (which generates my data) does not quote fields at all.

So the only way the CSV filter might be able to parse MySQL's CSV is to set

:liberal_parsing

to true. Is there any way I could do that?

Oh… And that's only available since CSV 2.4.0 (feature #11839)


#8

No, the filter does not support that.


(system) closed #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.