Problem with quotes in csv filter

Hello,

I'm having a problem with the csv plugin (seems the same problem as in this old (2018) article)

Quoted strings within fields that do not contain the field separator are interpreted as fields, and this provokes a parsing error.

For example, this input line:

2025-10-16 01:44:55,"style :expression (" (Parameter),5,64.39.106.92,10.97.203.236,/Common/dem-www-pr_443,Cross Site Scripting (XSS),1.0

raises the following error:

Error parsing csv {:field=>"message", :source=>"2025-10-16 01:44:55,\"style :expression (\" (Parameter),5,64.39.106.92,10.97.203.236,/Common/dem-www-pr_443,Cross Site Scripting (XSS),1.0", :exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1>

If I set a different char as quote_char:

    csv {
        columns => [ "Date_Time", "Signature_Name", "Severity_Id", "Source_Ip", "Destination_Ip", "Policy", "Attack_Type", "Hit_Counter"]
        separator => ","
        quote_char => '§'
    }

records with fields that contain the separator character, and are therefore quoted, are incorrectly split:

2025-10-16 20:54:44,"Generic Remote File/Path Include Attempt 4 (dir param, http/https)",4,64.39.106.3,10.97.219.194,/Common/pdc-www-pr_80,Remote File Include,1.0

the second field, Signature_Name, is terminated at the comma: "Signature_Name"=>"\"Generic Remote File/Path Include Attempt 4 (dir param",, and a new autonamed field gets added: "column9"=>"1.0"

The whole record, then, get rejected by Elastic:
Could not index event to Elasticsearch. {:status=>400, [ . . . ] "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [Source_Ip] of type [ip] in document with id 'qFaL8ZkBMYlRvXlrtBPW'. Preview of field's value: '4'"

The plugin version is logstash-filter-csv (3.1.1). As it was released in June 2021, I suppose this bug isn't going to be fixed, so any advice for a workaround is welcome

Paolo

I don’t think most folks think it is a bug. If a field contains a comma it has to be quoted with “, and if quoted then the entire field must be quoted and any quotes within the field must be changed to ““. That’s just the way Ruby works.

If you have a recognizable pattern like a pair of IP addresses you could try parsing the line using grok.

1 Like

There’s various tools like csvlint to check CSV validity, against eg RFC-4180.

I think we’ve all struggled with this type of thing before. My usual approach is to try to find a separator that just wont be in any field, as you generally know a bit about the expected data. And then double check with simple tools via awk or similar.

1 Like

My problem is not with the fields quoted because they contain commas, but with those that contain quoted strings (i.e. ...<fs>This should be a "single" field<fs>... ), like the one in the first example.
But I see now it's not rfc-4180 compliant: "If fields are not enclosed with double quotes, then double quotes may not appear inside the fields."

The "problematic" field is only the second one, so I suppose I could use some combination of dissect and grok as a workaround, but I hoped in something simpler.

PS definitely not a ruby CSV bug

What is the source of this data? Do you have any control on how this data is formated?

It's an output from an IBM QRadar AQL Saved Search. The endpoint is documentd here (funny it says [...] The formats are RFC compliant and can be JSON, CSV, XML, or tabular text. )

I thought requesting application/csv as return format would've saved me the json=>csv conversion (csv is more concise).
However, yes, I could switch to json output (and then the json filter), or edit the saved search and move the offending field in last position (and then use dissect)

If you can change to JSON I would recommend that, it is way more easier to work on Logstash.

1 Like