Csv upload error in logstash


(Gautham) #1

Hi All,

We are been trying to upload csv into elasticsearch through logstash and have stuck up with an issue. The excel has a lot of special characters having that in place we cant upload the document into elasticsearch it always throws error

[2018-11-30T17:13:15,940][WARN ][logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>"The incident occured due to wrong scheduling by Control M team.The Control M team is working on the issue\", AMS\r", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}

Any help will be appreciated
Gauti


(Lewis Barclay) #2

What is the character?


(Gautham) #3

hey @Eniqmatic it contains / # % () ' " .

Its actually a description field which contains almost all type of special characters

Thanks
Gauti


(Gabteni) #4

Hi Gauti,

I m new here and i've faced a lot of issues trying to push CSV data to Elastic with logstash, not sure that I m able to help, if you provide a sample of your first CSV line and your config file may be it'll be easier to figure out what's happening


(Lewis Barclay) #5

Yeah I thought so, this is going to cause issues because if you have quotes in the field then the filter thinks it is ending the filter. I would suggest removing the quotes.


(Gautham) #6

Here is the csv data

Incident_Number Configuration_item Opened_Date Description Assignment_group Parent_Company Contact_type Major_incident Caller Priority Resolved_Date Closed_Date Status Updated_by Closed_by Category Close_notes Support_Company
INC0070083 AMEIE 5/30/2018 17:17 ** With 3rd party Sempre ** Incorrect validation message when contracts have expired on the SPA list, AMS N/A Self-service FALSE Karthick Low 8/3/2018 14:56 8/8/2018 15:01 Closed system NA No action Cleared N/A
INC0061050 N/A 5/22/2018 10:23 +31625736794 can't call outside, mobile traffic class is 2A COM N/A Phone FALSE Gerards Low 5/22/2018 17:31 5/27/2018 18:00 Closed system NA Performance tuning Traffic signals changed N/A
INC0060108 IMONS 5/16/2018 8:55 \ce\rd_measurements\map1387.00\Fourposter\Mand coping from this or saving an excel file costs alot of time. INCIDENTS N/A Phone FALSE Mary Low 7/11/2018 13:02 7/11/2018 13:02 Closed system NA NA cleared N/A
INC0068767 SLK 5/11/2018 15:45 @ty0001 IS Mainframe does not give the selection SUPPLIERS N/A Phone FALSE Marce High 5/11/2018 22:31 7/11/2018 13:01 Closed system NA No action Issue fixed N/A
INC0054236 N/A 5/3/2018 14:11 : [Fault Call] BC-460487 : Path MUIDEN N/A Email FALSE Shankar Low 5/14/2018 12:52 5/19/2018 13:01 Closed system NA Hardware fix/Replacement patched N/A

Here is my config file

input {
  file {
    path => "/opt/installables/csv/new_folder/data.csv"
    start_position => "beginning"
   sincedb_path => "/dev/null"
     }
  }
filter {
  csv {
      separator => ","
      skip_empty_columns => true
     columns => ["Incident_Number","Configuration_item","Opened_Date","Description","Assignment_group","Parent_Company","Contact_type","Major_incident","Caller","Priority","Resolved_Date","Closed_Date","Status","Updated_by","Closed_by","Category","Close_notes","Support_Company"]
  }
}
output {
   elasticsearch {
     hosts => "1.1.1.3:9200"
     index => "data"
  }
#  stdout {
#  codec => rubydebug
#  }
}

Thanks
Gauti


(Rebmeister) #7

Hello Gauti,
Try to use Dissect plugin which may solve your problem, with the assumption that the separator character is not part of the field.

filter {

# extract the data
dissect {
	mapping => { "message" => "%{Incident_Number};%{Configuration_item};%{Opened_Date};%{Description}" }
    }        

}


(Gabteni) #8

Note sure of that but it seems that your csv strings are note quoted, could you show again the data but in text format ?

from the source we can see

source=>"The incident occured due to wrong scheduling by Control M team.The Control M team is working on the issue\", AMS\r"

I'm wrong maybe but if your fields are double quoted I guess data should look like this

source=>"The incident occured due to wrong scheduling by Control M team.The Control M team is working on the issue\", \"AMS\"\r"

So, if I'm not wrong strings seems to don'be quoted in your file.

And at the same time, if you do not provide any custom quote_char in your config file then your file will be treated as double quoted because it's the default value used by the csv plugin

quote_char => "\""

I guess you should double quote your fields especially if you have special chars within your data.

hope it helps


(Gautham) #9

dissect didnt help me @Michel99_7 still having the same problem

You are rite @lightonseo i think i need to do some rework on my csv

Thanks
Gauti


(Rebmeister) #10

Hello Gauti,

I tested it with the script below and it was working well.

  • I used this sample csv file (with ";" separator because commas are part of some fields)

Incident_Number;Configuration_item;Opened_Date;Description
INC0070083;AMEIE;5/30/2018 17:17;** With 3rd party Sempre ** Incorrect validation message when contracts have expired on the SPA list
INC0061050;N/A;5/22/2018 10:23;+31625736794 can't call outside, mobile traffic class is 2A
INC0060108;IMONS;5/16/2018 8:55;\ce\rd_measurements\map1387.00\Fourposter\Mand coping from this or saving an excel file costs alot of time.

  • and the script test.conf that uses the DISSECT plugin

input
{
# log cpdlc csv files
file {
path => "C:/config/test.csv"
#sincedb_path => "/dev/null"
start_position => "beginning"
discover_interval => 15
close_older => 60
mode => "tail"
}
}

filter
{
# extract the data
dissect {
mapping => { "message" => "%{Incident_Number};%{Configuration_item};%{Opened_Date};%{Description}" }
}
}

output {
stdout {codec => rubydebug}
}


(Gautham) #11

@Michel99_7 so we should not use csv filter instead we need to use dissect filter??

Thanks
Gauti


(Rebmeister) #12

Yes, if the separator is perfectly clear and is not used within your fields.
Moreover it seems that it is much more performant than the CSV plugin.


(Gautham) #13

Tats great @Michel99_7 this did the magic, i was able to upload the csv data, thank you....

and there was one small problem still i'm facing is, if there is any empty values in the csv logstash is not indexing those data, i have given index level malformed as well but still issue exist.
Any advice on this?

Thanks
Gauti


(Rebmeister) #14

Gauti, can you explain the problem with an example?


(Gautham) #15

There are few fields where the values are empty, below is an example

Incident_Number Configuration_item Opened_Date Description Assignment_group Parent_Company Contact_type Major_incident Caller Priority Resolved_Date Closed_Date Status Updated_by Closed_by Category Close_notes Support_Company
INC0213362 DC42 9/30/2018 23:54 Omnibus alert on DC42 Middleware IM Event FALSE Tivoli Interface Medium 10/1/2018 1:40 10/11/2018 2:00 Closed system No action Closing the incident, team is investigating with the Monitoring team. Infrastructure
INC0213361 IMG02 9/30/2018 23:46 Omnibus alert on IMG02 Wintel IM Event FALSE Tivoli Interface Medium 10/4/2018 2:00 10/11/2018 6:00 Closed system Close notes copied from Parent Incident: HW team already investigating the case.

In the above example, fields like closed_by and category has no values in it, its an empty string, logstash is unable to process these, in general to ignore these type of empty fields i usually use ignored_malformed:true in the mapping, when i'm using csv filter.

But now when using dissect filter looks like it not taking this mapping into consideration.

Thanks
Gauti


(Rebmeister) #16

Hello,

Just complete the dissect clause with missing fields with the separator (here ";").
Empty fields will normally be ingested with an empty string.

dissect {
	mapping => { "message" => "%{Incident_Number};%{Configuration_item};%{Opened_Date};%{Description};%{Closed_by};%{Category}" }
}

(Gautham) #17

By Using the separator(";") i'm not able to index any data its throwing a warning as

[2018-12-06T18:59:51,903][WARN ][org.logstash.dissect.Dissector] Dissector mapping, pattern not found

and only 7 of my documents are getting indexded

by using the separator(",") i'm still getting the same warning but 17k+ documents are getting indexed.

In total i'm expecting 22k+ documents to be indexed which is not happening, is this due to that warning which i'm getting above? looks like the documents which are giving warnings are not indexed.

There are two main fields "Description" & "Close_notes" which contains the special characters, if i remove these to columns all docs are getting indexed without any issue.

Thanks
Gauti


(Rebmeister) #18

Hello Gauti,
Can you attach your log file so that I can have a look?


(Gautham) #19

I dont see any error in log files its all the same warnings again.

FYI...

[2018-12-14T17:05:20,837][WARN ][org.logstash.dissect.Dissector] Dissector mapping, field found in event but it was empty {"field"=>"message", "event"=>{"host"=>"0.0.0.0", "tags"=>["_dissectfailure"], "@timestamp"=>2018-12-14T11:35:19.559Z, "@version"=>"1", "message"=>"", "path"=>"/opt/installables/csv/new_folder/data.csv"}}
[2018-12-14T17:05:20,837][WARN ][org.logstash.dissect.Dissector] Dissector mapping, pattern not found {"field"=>"message", "pattern"=>"%{Incident_Number},%{Configuration_item},%{Opened_Date},%(Description),%{Assignment_group},%{Parent_Company},%{Contact_type},%{Major_incident},%{Caller},%{Priority},%{Resolved_Date},%{Closed_Date},%{Status},%{Updated_by},%{Closed_by},%{Category},%(Close_notes),%{Support_Company}", "event"=>{"host"=>"0.0.0.0", "tags"=>["_dissectfailure"], "@timestamp"=>2018-12-14T11:35:19.559Z, "@version"=>"1", "message"=>"Hi James,", "path"=>"/opt/installables/csv/new_folder/data.csv"}}
[2018-12-14T17:05:20,837][WARN ][org.logstash.dissect.Dissector] Dissector mapping, field found in event but it was empty {"field"=>"message", "event"=>{"host"=>"0.0.0.0", "tags"=>["_dissectfailure"], "@timestamp"=>2018-12-14T11:35:19.559Z, "@version"=>"1", "message"=>"", "path"=>"/opt/installables/csv/new_folder/data.csv"}}
[2018-12-14T17:05:20,838][WARN ][org.logstash.dissect.Dissector] Dissector mapping, pattern not found {"field"=>"message", "pattern"=>"%{Incident_Number},%{Configuration_item},%{Opened_Date},%(Description),%{Assignment_group},%{Parent_Company},%{Contact_type},%{Major_incident},%{Caller},%{Priority},%{Resolved_Date},%{Closed_Date},%{Status},%{Updated_by},%{Closed_by},%{Category},%(Close_notes),%{Support_Company}", "event"=>{"host"=>"0.0.0.0", "tags"=>["_dissectfailure"], "@timestamp"=>2018-12-14T11:35:19.560Z, "@version"=>"1", "message"=>"We have added your name in required group. Now you have access on both mentioned databases.", "path"=>"/opt/installables/csv/new_folder/data.csv"}}

If i remove the Description & Closed_notes column from the csv all the data is getting uploaded properly, looks like there is some issue with these two field values.

Thought logstash will pickup watever raw data we insert in, looks like lot of things has to be done to get it uploaded.

Thanks
Gauti


(Rebmeister) #20

It seems that the message line is empty, and that's why dissect doesn't work.
Test the message field. drop the event if it is empty before calling DISSECT command.