Parsing CSV file with multiple key values in one line

david4 · March 4, 2020, 12:24am

Hello,

I'm new to the ELK stack and have been struggling to solve one problem. I'm trying to write a Logstash configuration to parse a log file of csv format as follow:

"time_stamp", 2020-02-28 16:35:16.048, "name", "Dan", "age", 22
"time_stamp", 2020-02-28 16:36:16.048, "name", "John", "age", 23
"time_stamp", 2020-02-28 16:37:16.048, "name", "Lu", "age", 24

The above csv log line has the following structure:

column name, value of type date time, column name, value of type String, column name, value of type integer.

Can someone show me how to use Logstash to process that log line so that when I send it into Elasticsearch and look into Kibana I can see

time_stamp: 2020-02-28 16:35:16.048, name: Dan, age: 22 ?

Note 1: I also want to keep the original data type (date time for "time_stamp" and integer for "age" instead of String) so that I can do some numerical visualizations in Kibana later.

Note 2: for the purpose of my program, I wouldn't know the column names. As a result, I cannot produce a log file of csv format such as 2020-02-28 16:35:16.048, Dan, 22 because Logstash configuration wouldn't be able to know what field each value belongs to.

Note 3: I'm currently thinking of for loop because the number of columns may change.

Thank you very much !!

Badger · March 4, 2020, 6:43pm

You could do that in ruby

    mutate { split => { "message" => "," } }
    ruby {
        code => '
            message = event.get("message")
            if message.is_a? Array and message.length % 2 == 0
                while message.length > 0 do
                    item = message.shift(2)
                    k = item[0].sub(/^[" ]*/, "").sub(/[" ]*$/, "")
                    v = item[1].sub(/^[" ]*/, "").sub(/[" ]*$/, "")
                    if v.to_i.to_s == v
                        v = v.to_i
                    end
                    event.set(k, v)
                end
            end
        '
        remove_field => [ "message" ]
    }
    date { match => [ "time_stamp", "yyyy-MM-dd HH:mm:ss.SSS" ] }

I apologize if .sub(/^[" ]*/, "").sub(/[" ]*$/, "") makes your eyeballs bleed. I can't think of the right way to do it right now.

david4 · March 5, 2020, 10:24pm

Thank you very much !!

This is exactly what I need. Though I have one question, how do you handle cases when the value contains comma or delimiter ?

Badger · March 5, 2020, 11:07pm

Then you would have to write a much more complicated parser than mutate+split.

system · April 2, 2020, 11:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Creating logstash config file for csv with multiple date fields Logstash	9	2969	July 6, 2017
Parse cvs file with a key and multiple values Logstash	4	663	May 24, 2018
Logstash parse csv file with a column that contains multiple comma separated data points? Logstash	8	939	December 29, 2020
Parse string to timestamp in csv Logstash	10	2951	April 5, 2019
Need help with the grok and the date filter for parsing logs Logstash	29	2721	July 25, 2017

Parsing CSV file with multiple key values in one line

Related topics