Change CSV Delimiter in the Source data

#1

My source CSV contains "," delimiters but has field data which contain this delimiter within double quotes.

I originally started the approach to remove the single "," within the double quoted strings since the csv{} filter identified the input has being malformed, but I noticed that I have some other fields containing double quotes with commas which are needed.

I decided to use mutate+gsub to identify all "," delimiters in the source data and replace them with a special character - except to leave the single quote alone in the double quote strings. By doing this, I'm hoping the CSV{} filter can then properly parse the data.

The problem I am having now is my regex expression contains double quote characters and the gsub function is complaining....

[ERROR] 2019-05-17 09:36:07.124 [Converge PipelineAction::Create] agent - Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash
::ConfigurationError", :message=>"Expected one of #, {, ,, ] at line 10, column 32 (byte 168) after filter {\n mutate {\n gsub => ["message", "/(?!\B"", :backtrace=>["/usr/share/l
ogstash/logstash-core/lib/logstash/compiler.rb:41:in compile_imperative'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:49:incompile_graph'", "/usr/share/logstash/logsta
sh-core/lib/logstash/compiler.rb:11:in block in compile_sources'", "org/jruby/RubyArray.java:2577:inmap'", "/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:10:in compile_so urces'", "org/logstash/execution/AbstractPipelineExt.java:151:ininitialize'", "org/logstash/execution/JavaBasePipelineExt.java:47:in initialize'", "/usr/share/logstash/logstash-core/li b/logstash/java_pipeline.rb:23:ininitialize'", "/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:36:in execute'", "/usr/share/logstash/logstash-core/lib/logstas h/agent.rb:325:inblock in converge_state'"]}

Here is my filter code.... can someone tell me the proper syntax to allow this regex code to be properly recognized by the function?

filter {
mutate {
gsub => ["message", "/(?!\B"[^"]),(?![^"]"\B)/g", ""] } csv { separator => ""
skip_empty_columns => "true"
skip_empty_rows => "true"
columns => ["Asset Alternate IPv4 Addresses","Asset Alternate IPv6 Addresses","Asset Criticality","Asset ID","Asset IP Address","Asset Location","Asset MAC Addresses","Asset Names",$
}

Thank you!!!

(Charlie) #2

Could you give us one exemplary line of that csv that is causing issues?

#3

Sure.... here are a few lines (from a Nexpose VA report).....

Fields such as "Risk Score" have a comma. Some fields have multiple CVE references in double quotes with each URL separated by a single quote. The regex code in my first post successfully identifies each single quote outside double quote strings so it makes sense in my head to replace that with something that is not found in the strings contained in the "double quotes".

Asset Alternate IPv4 Addresses,Asset Alternate IPv6 Addresses,Asset Criticality,Asset ID,Asset IP Address,Asset Location,Asset MAC Addresses,Asset Names,Asset OS Family,Asset OS Name,Asset OS Version,Asset Owner,Asset Risk Score,Custom Tag,Exploit Count,Exploit Minimum Skill,Exploit URLs,Malware Kit Count,Malware Kit Names,Scan ID,Scan Template Name,Service Name,Service Port,Service Product,Service Protocol,Site Importance,Site Name,Vulnerability Additional URLs,Vulnerability Age,Vulnerability CVE IDs,Vulnerability CVE URLs,Vulnerability CVSS Score,Vulnerability CVSSv3 Score,Vulnerability CVSSv3 Vector,Vulnerability CVSS Vector,Vulnerability ID,Vulnerability PCI Compliance Status,Vulnerability Proof,Vulnerability Published Date,Vulnerability Reference IDs,Vulnerability Reference URLs,Vulnerability Risk Score,Vulnerability Severity Level,Vulnerability Tags,Vulnerability Test Date,Vulnerability Test Result Code,Vulnerability Title,Vulnerable Since
,,,412,192.168.65.217,,00:50:56:A7:C3:C3,ubuntu,Ubuntu Linux,Ubuntu Linux,16.04,,"2,816",,0,,,0,,308,Full audit without Web Spider,System,0,,ip,Normal,JEREMYs Assets,,59 Days,CVE-2018-19824,http://nvd.nist.gov/vuln/detail/CVE-2018-19824,4.6,7.8,CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H,(AV:L/AC:L/Au:N/C:P/I:P/A:P),ubuntu-cve-2018-19824,Fail,"Vulnerable OS: Ubuntu Linux 16.04

Vulnerable software installed: Ubuntu linux-image-generic 4.4.0.141.147",2018-12-03,"UBUNTU:3879-1,UBUNTU:3879-2,UBUNTU:3930-1,UBUNTU:3930-2,UBUNTU:3931-1,UBUNTU:3931-2,UBUNTU:3933-1,UBUNTU:3933-2","https://usn.ubuntu.com/3879-1/,https://usn.ubuntu.com/3879-2/,https://usn.ubuntu.com/3930-1/,https://usn.ubuntu.com/3930-2/,https://usn.ubuntu.com/3931-1/,https://usn.ubuntu.com/3931-2/,https://usn.ubuntu.com/3933-1/,https://usn.ubuntu.com/3933-2/",221,5,"Canonical,Ubuntu Linux",2019-03-27,vv,Ubuntu: (Multiple Advisories) (CVE-2018-19824): Linux kernel (Trusty HWE) vulnerabilities,2019-03-18
,,,412,192.168.65.217,,00:50:56:A7:C3:C3,ubuntu,Ubuntu Linux,Ubuntu Linux,16.04,,"2,816",,0,,,0,,308,Full audit without Web Spider,System,0,,ip,Normal,JEREMYs Assets,,59 Days,CVE-2019-3823,http://nvd.nist.gov/vuln/detail/CVE-2019-3823,5.0,7.5,CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H,(AV:N/AC:L/Au:N/C:N/I:N/A:P),ubuntu-cve-2019-3823,Pass,"Vulnerable OS: Ubuntu Linux 16.04

Vulnerable software installed: Ubuntu libcurl3-gnutls 7.47.0-1ubuntu2.11",2019-02-06,"UBUNTU:3882-1,DEBIAN:DSA-4386","https://usn.ubuntu.com/3882-1/,https://security-tracker.debian.org/tracker/DSA-4386",142,5,"Canonical,Mail,Ubuntu Linux",2019-03-27,vv,Ubuntu: USN-3882-1 (CVE-2019-3823): curl vulnerabilities,2019-03-18
,,,412,192.168.65.217,,00:50:56:A7:C3:C3,ubuntu,Ubuntu Linux,Ubuntu Linux,16.04,,"2,816",,0,,,0,,308,Full audit without Web Spider,System,0,,ip,Normal,JEREMYs Assets,,59 Days,CVE-2019-3822,http://nvd.nist.gov/vuln/detail/CVE-2019-3822,7.5,9.8,CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H,(AV:N/AC:L/Au:N/C:P/I:P/A:P),ubuntu-cve-2019-3822,Fail,"Vulnerable OS: Ubuntu Linux 16.04

Vulnerable software installed: Ubuntu libcurl3-gnutls 7.47.0-1ubuntu2.11",2019-02-06,"UBUNTU:3882-1,DEBIAN:DSA-4386","https://usn.ubuntu.com/3882-1/,https://security-tracker.debian.org/tracker/DSA-4386",540,8,"Canonical,Ubuntu Linux,Web",2019-03-27,vv,Ubuntu: USN-3882-1 (CVE-2019-3822): curl vulnerabilities,2019-03-18
,,,412,192.168.65.217,,00:50:56:A7:C3:C3,ubuntu,Ubuntu Linux,Ubuntu Linux,16.04,,"2,816",,0,,,0,,308,Full audit without Web Spider,System,0,,ip,Normal,JEREMYs Assets,,59 Days,CVE-2019-6454,http://nvd.nist.gov/vuln/detail/CVE-2019-6454,4.9,5.5,CVSS:3.0/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H,(AV:L/AC:L/Au:N/C:N/I:N/A:C),ubuntu-cve-2019-6454,Pass,"Vulnerable OS: Ubuntu Linux 16.04

Vulnerable software installed: Ubuntu systemd 229-4ubuntu21.15",2019-02-18,UBUNTU:3891-1,https://usn.ubuntu.com/3891-1/,91.5,5,"Canonical,Denial of Service,Ubuntu Linux",2019-03-27,vv,Ubuntu: USN-3891-1 (CVE-2019-6454): systemd vulnerability,2019-03-18
,,,412,192.168.65.217,,00:50:56:A7:C3:C3,ubuntu,Ubuntu Linux,Ubuntu Linux,16.04,,"2,816",,0,,,0,,308,Full audit without Web Spider,System,0,,ip,Normal,JEREMYs Assets,,50 Days,CVE-2019-1559,http://nvd.nist.gov/vuln/detail/CVE-2019-1559,4.3,5.9,CVSS:3.0/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:N/A:N,(AV:N/AC:M/Au:N/C:P/I:N/A:N),ubuntu-cve-2019-1559,Fail,"Vulnerable OS: Ubuntu Linux 16.04

#4

If you have double quotes in your regexp then use single quotes instead of double quotes around it. For example, to remove all double quotes from a string (and I know that's not what you want to do) you would use

mutate { gsub => [ "message", '"', "" ] }
#5

That appeared to have worked. I had to rejig the expression to run and it appears to be modifying the message contents and subbing commas with $.

filter {
mutate {
gsub => ["message", '(?!\B"[^"]),(?![^"]"\B)', "$"]
}

It's still complaining about some illegal quoting....

[WARN ] 2019-05-17 10:45:05.960 [[main]>worker0] csv - Error parsing csv {:field=>"message", :source=>"Linux",2019-03-27,vv,Ubuntu: (Multiple Advisories) (CVE-2018-19824): Linux kernel (
Trusty HWE) vulnerabilities,2019-03-18 ,,,412,192.168.65.217,,00:50:56:A7:C3:C3,ubuntu,Ubuntu ", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}

Perhaps I need to get rid of the double quotes?

#6

Removing the double quotes appears to have worked to remove the illegal character issue.

filter {
mutate {
gsub => ["message", '(?!\B"[^"]),(?![^"]"\B)', "$", "message", '"', ""]
}

Everything is being parsed without error. There are still problems with matching the fields to the values but that's another issue.

Thanks folks.

#7

If you look at the messages that you posted you will see there are characters missing from your regexps and random parts of them are italic. That makes it really hard for folks to know what your regexps actually look like. If you have something like

gsub => [ "message", "_[^]*]]_[0-9]*"

then it appears as

gsub => [ "message", "[^]*]][0-9]*"

unless you either escape the markdown characters (which I did above), or indent it by four spaces, so that it appears as

gsub => [ "message", "_[^\]*]]_[0-9]*"

or surround it with lines containing three backticks

```
gsub => [ "message", "_[^\]*]]_[0-9]*"
```

in which case it will appear as

gsub => [ "message", "_[^\]*]]_[0-9]*"

You should also check the preview pane on the right to see what you post is going to look like to others.