Parse csv with variable number of columns

Petr.Simik · January 22, 2019, 7:58am

What would be the best way to parse simple file
with the structure where second column "optional_retry_count" sometime exist sometime not.

message_type, optional_retry_count, date, xml_data

EURec,20190121150448,<?xml version="1.0"....
EUSlc,2,20190121150458,<?xml version="1.0"....
EUNot,3,20190121150468,<?xml version="1.0"....
EUSet,20190121150488,<?xml version="1.0"....
EURec,0,20190121150498,<?xml version="1.0"....

using:

filter {
grok {
match => { "message" => "%{DATA:type},%{DATA:date},%{GREEDYDATA:xmldata}" }
}

Which is the bests way to decode optional 2nd field Retry

%{DATA:retry}

should I use Grok, Dissect, csv and how to incorporate the condition?

expected result:

{
"date": "20190121150458",
"type": "EUSlc",
"xmldata": "<?xml version="1.0"....\r",
"retry": "2"
}

Thank you

Badger · January 22, 2019, 1:18pm

A csv filter will not like the XML data, so I would use dissect.

if [message] =~ /^[^,]+,[^,]+,[^,]+,<\?xml/ {
    dissect { mapping => { "message" => "%{message_type},%{retry_count},%{date},%{xml_data}" } }
} else if [message] =~ /^[^,]+,[^,]+,<\?xml/ {
    dissect { mapping => { "message" => "%{message_type},%{date},%{xml_data}" } }
} else {
    mutate { add_tag => [ "somethingElse" ] }
}

Petr.Simik · January 22, 2019, 1:53pm

Thank you this solved my problem.
excellent fast answer

system · February 19, 2019, 1:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I conditionally parse a CSV into different columns? Logstash	2	247	August 13, 2021
Logstash csv - Filters only rows with specific column count Logstash	2	951	January 30, 2020
Changing columns based on first character Logstash	5	750	January 17, 2018
Parsing csv file which contains xml data as a field Logstash	2	945	September 22, 2017
Translate plugin using csv Logstash	3	2346	April 30, 2019

Parse csv with variable number of columns

Related topics