.CSV input with JSON in it :(

Hey Team,

New here! And new to ELK stack. Me and my colleague have spent days trying to figure out a solution to our problem, and yet have not come up with a resolution.

Background: Our company has tasked us to ingest Gigabytes worth of .CSV files to be used with Elastic / Kibana.

Problem: The data input is a .CSV file. The first three columns parse fine. The fourth column ( and always fourth). Contains JSON data that is not properly escaped. Using the ',' delimiter obviously breaks the JSON column into multiple fields, sometimes 4 - 48, dependent on the amount of commas in the JSON data.

We have so far looked into using a space as the delimiter, but this did not work either as there are spaces in the JSON data.

Does anyone know how we can parse the JSON data, and prevent the commas and illegal quotations from failing / creating extra fields?

Thank you so much

Are there always four columns? If so, then you could use dissect rather than a csv filter. Then use a json filter to parse the JSON

dissect { mapping => { "message" => "%{col1},%{col2},%{col3},%{restOfLine}" } }

Hi Badger,

Thanks for the reply.. our coloumns are set up like this ..

YYYY-MM-DDTHH:MM:SSZ LOG_LEVEL TAG MESSAGE

Where the fourth column message , is the JSON data.
Would your suggestion still work?

Thanks

Yes, replace the commas in the dissect filter with spaces.

Badger, I just want the message to be the %{restOfLine}, not all of the columns.
update: I have 5 columns, JSON on the 5th

here is my current filter:

filter {


   dissect { mapping => {"message" =>  "%{timestamp} %{timestamp_iso} %{log_level} %{tag}  %{data}"  }}


  date {
    match => [ "timestamp_iso" , "yyyy'.'MM'.'dd HH:mm:ss'.'SSS"]
    target => "@timestamp"
  }
  mutate {
    convert => ["timestamp", "integer"]
    add_field => {
      "file" => "%{[@metadata][s3][key]}"
    }
  }
  json {
    source => "data"
    target => "data"
    skip_on_invalid_json => true
  }
  grok {
    match => { "file" => "%{GREEDYDATA:device_id}_%{GREEDYDATA:log_time}.csv" }
  }
}
output not included

And I am getting this error message from logstash:

[2019-07-04T19:04:41,046][WARN ][org.logstash.dissect.Dissector] Dissector mapping, pattern not found {"field"=>"message", "pattern"=>"%{timestamp} %{timestamp_iso} %{log_level} %{tag} %{data}", "event"=>{"@version"=>"1", "message"=>"1561434808030,2019.06.25 00:53:28.030,INFO,sensitive-data-sensitive-data,JSON: {\"data\":{\"type\":\"external features\",\"event\":\"App started\"},\"sensitive-data\":\"sensitive-data\",\"sensitive-data\":\"sensitive-data\",\"timestamp\":\"2019-06-25T03:53:27.778Z\",\"type\":\"externalFeature\",\"sensitive-data\":\"sensitive-data\"}\n", "tags"=>["_dissectfailure"], "@timestamp"=>2019-07-04T19:04:40.549Z}}

I know that I didn't do something right, just looking for some more help :slight_smile:

Thank you

Update: The dissect matches when I put commas instead of spaces.

All good, that worked! Thanks for your help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.