.CSV input with JSON in it :(

EvanG · July 4, 2019, 4:35pm

Hey Team,

New here! And new to ELK stack. Me and my colleague have spent days trying to figure out a solution to our problem, and yet have not come up with a resolution.

Background: Our company has tasked us to ingest Gigabytes worth of .CSV files to be used with Elastic / Kibana.

Problem: The data input is a .CSV file. The first three columns parse fine. The fourth column ( and always fourth). Contains JSON data that is not properly escaped. Using the ',' delimiter obviously breaks the JSON column into multiple fields, sometimes 4 - 48, dependent on the amount of commas in the JSON data.

We have so far looked into using a space as the delimiter, but this did not work either as there are spaces in the JSON data.

Does anyone know how we can parse the JSON data, and prevent the commas and illegal quotations from failing / creating extra fields?

Thank you so much

Badger · July 4, 2019, 6:09pm

Are there always four columns? If so, then you could use dissect rather than a csv filter. Then use a json filter to parse the JSON

dissect { mapping => { "message" => "%{col1},%{col2},%{col3},%{restOfLine}" } }

EvanG · July 4, 2019, 6:15pm

Hi Badger,

Thanks for the reply.. our coloumns are set up like this ..

YYYY-MM-DDTHH:MM:SSZ LOG_LEVEL TAG MESSAGE

Where the fourth column message , is the JSON data.
Would your suggestion still work?

Thanks

Badger · July 4, 2019, 6:43pm

Yes, replace the commas in the dissect filter with spaces.

EvanG · July 4, 2019, 7:15pm

Badger, I just want the message to be the %{restOfLine}, not all of the columns.
update: I have 5 columns, JSON on the 5th

here is my current filter:

filter {


   dissect { mapping => {"message" =>  "%{timestamp} %{timestamp_iso} %{log_level} %{tag}  %{data}"  }}


  date {
    match => [ "timestamp_iso" , "yyyy'.'MM'.'dd HH:mm:ss'.'SSS"]
    target => "@timestamp"
  }
  mutate {
    convert => ["timestamp", "integer"]
    add_field => {
      "file" => "%{[@metadata][s3][key]}"
    }
  }
  json {
    source => "data"
    target => "data"
    skip_on_invalid_json => true
  }
  grok {
    match => { "file" => "%{GREEDYDATA:device_id}_%{GREEDYDATA:log_time}.csv" }
  }
}
output not included

And I am getting this error message from logstash:

[2019-07-04T19:04:41,046][WARN ][org.logstash.dissect.Dissector] Dissector mapping, pattern not found {"field"=>"message", "pattern"=>"%{timestamp} %{timestamp_iso} %{log_level} %{tag} %{data}", "event"=>{"@version"=>"1", "message"=>"1561434808030,2019.06.25 00:53:28.030,INFO,sensitive-data-sensitive-data,JSON: {\"data\":{\"type\":\"external features\",\"event\":\"App started\"},\"sensitive-data\":\"sensitive-data\",\"sensitive-data\":\"sensitive-data\",\"timestamp\":\"2019-06-25T03:53:27.778Z\",\"type\":\"externalFeature\",\"sensitive-data\":\"sensitive-data\"}\n", "tags"=>["_dissectfailure"], "@timestamp"=>2019-07-04T19:04:40.549Z}}

I know that I didn't do something right, just looking for some more help

Thank you

EvanG · July 4, 2019, 7:48pm

Update: The dissect matches when I put commas instead of spaces.

All good, that worked! Thanks for your help

system · August 1, 2019, 7:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to parse mix json logs Logstash	28	4568	March 25, 2019
Convert column to JSON Logstash	3	1671	January 2, 2019
Use Dissect filter on json input data Logstash	2	1785	November 6, 2018
Comma destroys perfect events \| file { codec => multiline { pattern Logstash	1	355	November 14, 2018
Csv upload error in logstash Logstash	20	1643	January 11, 2019

.CSV input with JSON in it :(

Related topics