Csv with double quotes, spaces, special characters not inserting

I'm trying to insert my csv report into elastisearch through logstash. But not able to insert successfully.

My csv data look as given below.

1,1234556,30-12-2022,frank.van,SAMPLE_PRODUCT,"[Name] Frank Van Puffelen JAVA.
[Area/Pin] San Francisco, CA 
[Region/Status/Identify] Android Plaltfrom
[Case#] Jira-01234
[Problem] Messaging app not booting.
[Staring Point] Google service for the notifications
[Evaluate] Cloud Messaging.
[Verification Mode] Local Device.
[Empname] Frank Van.

Domain:Cloud_S,Android:S_OS
***** Ticket Status : https://jenkins.company.com/job/889900112 *****
"

And my logstash conf file as follows.

input {
   file {
      path => "/home/user/logs/app.csv"
      start_position => "beginning"
      sincedb_path => "/dev/null"
   }
}
filter {
    csv {
        separator => ","
        columns => ["Sl.No", "Emp_ID", "Date", "Emp_Name", "Product_Name", "Item_Details"]
    }

}
output {
  elasticsearch {
    hosts => "localhost:9200"
    index => "java-app"
    document_type => "Emp_ID"
  }
  stdout{}
}

Since in Item_Details column i have Double quotes, Spaces in between words and special characters i can't insert successfully into logstash.

  csv {
    separator => ","
    quote_char => "\""
    columns => ["Item_Details"]
  }
  
  mutate {
    gsub => ["Item_Details", "\s+", ""]
  }

Above filters couldn't sovle my case, So please let me know how do i insert my csv data As-it-is into Elasticsearch?

You will need to use a multiline codec to combine all 13 lines of that "line" into a single event. If your "lines" always end with a " on its own then that might be as simple as

file {
  path => "/home/user/logs/app.csv"
  start_position => "beginning"
  sincedb_path => "/dev/null"
  codec => multiline { 
      pattern => '^"'
      negate => "true"
      what => "next"
}

However, if you ever have lines that do not spill over onto another line then that will not work. Without seeing a lot more of the data it is hard to give a solution.

@Badger - Now csv data able to parse, but in logstash log it shows CSV column header value for respective data. it is not loading the actual value for each header.

logstash         |                  "Emp_Name" => "Emp_Name",
logstash         |        "Product_Name" => "Product_Name",
logstash         |               "message" => "Sl.No,Emp_ID,Date,Emp_Name,Product_Name,Item_Details\r\n1,1234556,30-12-2022,frank.van,SAMPLE_PRODUCT,\"[Name] Frank Van Puffelen JAVA.\r\n[Area/Pin] San Francisco, CA \r\n[Region/Status/Identify] Android Plaltfrom\r\n[Case#] Jira-01234\r\n[Problem] Messaging app not booting.\r\n[Staring Point] Google service for the notifications\r\n[Evaluate] Cloud Messaging.\r\n[Verification Mode] Local Device.\r\n[Empname] Frank Van.\r\n\r\nDomain:Cloud_S,Android:S_OS\r\n***** Ticket Status : https://jenkins.company.com/job/889900112 *****\r",
logstash         |              "@version" => "1",
logstash         |                  "path" => "/home/user/logs/app.csv",
logstash         |          "Date" => "Date",
logstash         |           "Item_Details" => "Item_Details",
logstash         |     "Emp_ID" => "Emp_ID",
logstash         |            "@timestamp" => 2023-06-22T05:48:22.714Z,
logstash         |                  "tags" => [
logstash         |         [0] "multiline"
logstash         |     ],
logstash         |         "host" => "828967718f28",
logstash         |         "Sl.No" => "Sl.No"
logstash         | }

skip_header => "true" didn't help.

Look at your [message] field. It has a header row before the actual data you are interested in, so the csv filter has parsed the values from the header row and ignore the remaining columns.

As I said "if you ever have lines that do not spill over onto another line then that will not work". That is exactly what you have.

It might be possible to use an ad-hoc approach like using mutate+gsub to remove the header row and \r\n if they are present. Then again, it might not be a solvable problem.

Here is my updated logstash.conf file.

input {
  file {
    path => "/home/user/logs/app.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
    codec => multiline { 
      pattern => '^"'
      negate => "true"
      what => "next"
  }
 } 
}

filter {
  if ![message] {
    drop {}
  } else {
    mutate {
      gsub => [
        "message", '"', '', 
        "message", "\r\n", '' 
      ]
    }
    csv {
      separator => ","
      skip_header => true
      columns => ["Sl.No", "Emp_ID", "Date", "Emp_Name", "Product_Name", "Item_Details"]
    }
  }
}

output {
    elasticsearch {
    hosts => "localhost:9200"
    index => "java-app"
    document_type => "Emp_ID"
    }
    stdout {
    codec => rubydebug
  }
}

Now when i insert into Logstash it resulting below.

logstash         | {
logstash         |               "column7" => "1234556",
logstash         |                  "tags" => [
logstash         |         [0] "multiline"
logstash         |     ],
logstash         |          "Date" => "Date",
logstash         |               "message" => "Sl.No,Emp_ID,Date,Emp_Name,Product_Name,Item_Details,1234556,30-12-2022,frank.van,SAMPLE_PRODUCT,[Name] Frank Van Puffelen JAVA.[Area/Pin] San Francisco, CA [Region/Status/Identify] Android Plaltfrom[Case#] Jira-01234[Problem] Messaging app not booting.[Staring Point] Google service for the notifications[Evaluate] Cloud Messaging.[Verification Mode] Local Device.[Empname] Frank Van.Domain:Cloud_S,Android:S_OS***** Ticket Status : https://jenkins.company.com/job/889900112 *****\r",
logstash         |                  "path" => "/home/Emp_Name/logs/app.csv",
logstash         |                  "host" => "9807a3ce9ab4",
logstash         |              "column13" => "Android:S_OS***** Ticket Status : https://jenkins.company.com/job/889900112 *****",
logstash         |              "column10" => "SAMPLE_PRODUCT",
logstash         |              "@version" => "1",
logstash         |           "description" => "Item_Details",
logstash         |              "column11" => "[Name] Frank Van Puffelen JAVA.[Area/Pin] San Francisco",
logstash         |            "@timestamp" => 2023-06-24T02:20:45.496Z,
logstash         |                  "Emp_Name" => "Emp_Name",
logstash         |         "Sl.No" => "Sl.No",
logstash         |               "column8" => "30-12-2022",
logstash         |               "column9" => "frank.van",
logstash         |     "Emp_ID" => "Emp_ID",
logstash         |        "Product_Name" => "Product_Name",
logstash         |              "column12" => " CA [Region/Status/Identify] Android Plaltfrom[Case#] Jira-01234[Problem] Messaging app not booting.[Staring Point] Google service for the notifications[Evaluate] Cloud Messaging.[Verification Mode] Local Device.[Empname] Frank Van.Domain:Cloud_S"
logstash         | }

But it is not expected result. the field values are aligned not correctly. Can you help me with conf file to insert my csv file into logstash it would be great helpful.

Even with below no success. So any help will be appreciated.

filter {
  csv {
      separator => ","
      quote_char => "\""
      skip_header => true
      columns => ["Sl.No", "Emp_ID", "Date", "Emp_Name", "Product_Name", "Item_Details"]
    }
    mutate {
      gsub => ["Item_Details", '"', '', "Item_Details", "\r\n", '']
     }
}

Any help is appreciated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.