Issue parcing csv file using Input csv plugin - ELK 5.5.0


(Haree Krushnan) #1

I am trying to breakdown a csv file with incident data using the csv input filter and have run into the following error. I had used a multiline codec to combine multiple lines into a single line (multiple lines within a single column of the csv).

The config file i had come up with is :slight_smile:

input {
        file {
                path => "/home/elkuser/KP/IN_DATA/incident/sample/KP_Incident_test2.csv"
                start_position => "beginning"
                sincedb_path => "/dev/null"
                codec => multiline {
                        pattern => "^(INC)"
                        negate => "true"
                        what => "previous"
                        max_lines => 4000
                }
              }
}


filter{
        csv{
                columns => ["INCIDENTNUM","CREATEDTTM","FULLNAME","LASTRESOLVEDDTTM","CLOSEDTTM","NAME","STATUS","DETAIL","SUMMARY","#_DAYS_REPORT_VS_RESOLVED_DT","#_HOURS_REPORT_VS_RESOLVED_DT","#_DAYS_REPORTED_VS_CLOSED_DT","TIER1","TIER2","TIER3","TIER4","RESOLUTION","LOGINID","LOGINID","SRCMODDTTM","REPORTDTTM","SERVICE_TYPE","STATUS_REASON","OWNER_GROUP","CATEGORIZATION_TIER_1","CATEGORIZATION_TIER_2","CATEGORIZATION_TIER_3","RESOLUTION_CATEGORY","RESOLUTION_CATEGORY_TIER_2","RESOLUTION_CATEGORY_TIER_3","PRIORITY","CI"]
                separator => ","
                convert => { "#_DAYS_REPORT_VS_RESOLVED_DT" => "integer" "#_HOURS_REPORT_VS_RESOLVED_DT" => "integer" "#_DAYS_REPORTED_VS_CLOSED_DT" => "integer" }
        }
}

output {
        file{
              path => "/home/elkuser/KP/OUT_DATA/H-OUT.csv"
              codec => line
        }
        stdout {
                        codec => rubydebug
                        }
        }

21:56:24.725 [[main]>worker0] WARN logstash.filters.csv - Error parsing csv {:field=>"message", :source=>"INC000015663654,2017-01-21-07.05.34.000000,Gani Vinay,2017-01-24-07.03.41.000000,2017-02-09-02.02.04.000000,WINT CC,Closed,\"Summary: mws002.crdc.ap.org is unreachable. The host has failed to respond to the ping request.\nDate: Jul 29,2016 10:28 CUT\tSeverity: Minor\nResourceId: mws002.crdc.ap.org\nTicketGroup: WINT CC\tCustomerCode: kph\nInstanceId: 094.13.40.244\nInstanceValue: unreachable\nInstanceSituation: Node Status\nComponentType: ComputerSystem\nComponent: NodeAvail\nSubComponent: Ping\nNode: mws002.crdc.ap.org\nAlertKey\",mws002.crdc.ap.org(172.21.40.244) is unreachable. The host has failed to respond,2,71,18,Software,Infrastructure,Utility,Tivoli Monitoring,Server is Decommission WO0000007223546.,E388276,E388276,2017-03-12-16.35.39.000000,2017-01-21-07.05.33.000000,3,\"19,000\",IPCENTER AUTO,,,,Operating System,,,Medium,\r", :exception=>#<CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 1).>}


(Haree Krushnan) #2

I managed to fix this, by adding a ruby function to the above config!.

ruby {
                init => '
                        #Function for combining multiple Syslog records in to one record.
                        def removecharacters(message)
                               begin
                                        result = "";
                                        result=message.tr! "\s\r\n\t\f", "";
                                        return result;
                               rescue Exception => e
                                        puts "Exception raised Inside the removecharacters method";
                                        puts e.message;
                                        puts e.backtrace.inspect;
                               end
                         end
                        '
                code => "msg = removecharacters(event.get('message'));
                         event.set('message',msg);"
        }

(Mark Walkom) #3

Thanks for sharing this solution! The CSV filter doesn't really work well with CRLFs in fields so this is super handy.

Also, we’ve renamed ELK to the Elastic Stack, otherwise Beats feels left out :wink:


(Guy Boertje) #4

you could use mutate gsub before the csv filter to replace all whitespace with one space.


(Haree Krushnan) #5

Hi,
Can you help me with a sample of how it can be done with mutate?. I am new to Elastic Stack and hence this would be a good starting point to make better use of the available options.


(Guy Boertje) #6

Try this:

filter {
  mutate {
       gsub => ["[message]", "\s", " "]
     }
  }
  csv{
      columns => ["INCIDENTNUM","CREATEDTTM","FULLNAME","LASTRESOLVEDDTTM","CLOSEDTTM","NAME","STATUS","DETAIL","SUMMARY","#_DAYS_REPORT_VS_RESOLVED_DT","#_HOURS_REPORT_VS_RESOLVED_DT","#_DAYS_REPORTED_VS_CLOSED_DT","TIER1","TIER2","TIER3","TIER4","RESOLUTION","LOGINID","LOGINID","SRCMODDTTM","REPORTDTTM","SERVICE_TYPE","STATUS_REASON","OWNER_GROUP","CATEGORIZATION_TIER_1","CATEGORIZATION_TIER_2","CATEGORIZATION_TIER_3","RESOLUTION_CATEGORY","RESOLUTION_CATEGORY_TIER_2","RESOLUTION_CATEGORY_TIER_3","PRIORITY","CI"]
      separator => ","
      convert => { "#_DAYS_REPORT_VS_RESOLVED_DT" => "integer" "#_HOURS_REPORT_VS_RESOLVED_DT" => "integer" "#_DAYS_REPORTED_VS_CLOSED_DT" => "integer" }
  }
}

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.