Logstash File Input Plugin and Windows Line Endings


(tdsacilowski) #1

Hello,

I have the following Logstash (v5.1) conf file that I'm using to ingest data and am running into an issue where it seems to not be recognizing Windows line endings as a row separator (\r\n):

input {
    file {
        path => "/root/taiga-stories"
        codec => multiline {
            pattern => "^ref,|%{TIMESTAMP_ISO8601}(?:,[^,]*){7},(?:\"[\d,]+\"){0,1},[^,]*,\[.*\],\d+$"
            negate => "true"
            what => "next"
        }
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {
    csv {
        separator => ","
        columns => ["ref","subject","description","sprint","sprint_estimated_start","sprint_estimated_finish","owner","owner_full_name","assigned_to","assigned_to_full_name","status","is_closed","back-points","design-points","front-points","security-points","ux-points","total-points","backlog_order","sprint_order","kanban_order","created_date","modified_date","finish_date","client_requirement","team_requirement","attachments","generated_from_issue","external_reference","tasks","tags","watchers","voters"]
    }
    if [ref] == "ref" {
        drop { }
    }
}

output {
    #elasticsearch {
    #    hosts => ["172.31.16.200:9200"]
    #    index => "taiga-stories-test"
    #}
    stdout {
        codec => rubydebug
    }
}

When testing with a small input file that was created via a copy/paste, everything was working as expected but when trying with the full input file Logstash was failing. Here's a snippet of the message field (entire field is quite large) from the failed attempts:

"message" => "1,Clam AV Repository,\"Provide Clam AV Repository across all environments and deliver Clam AV App via PCF\nCustomer: Cyber\nUser Story identified on: 12/13/16\",,,,d.h,D H,d.z,D Z,New,False,5.0,,,,,5.0,37,20,1481817781908,2016-12-15 16:03:01.919008+00:00,2017-01-27 13:19:47.857537+00:00,,False,False,0,,,\"2,3,4,5,6,7,26,27\",,[],0\r\n9,Meet with AIT .net PaaS,\"Output is to have a list of viable .NET applications candidates that can be deployed via PCF; user story from 12/16 PaaS Brown Bag\nCustomer: B B\nRequested on: 12/16/16\",06 Jan 2017,2016-12-20,2017-01-06,d.h,D H,d.h,D H,New,True,1.0,,,,,1.0,77,31,1481913643419,2016-12-16 18:40:43.431810+00:00,2016-12-21 18:24:51.681500+00:00,2017-01-05 16:47:05.664772+00:00,False,False,0,,,\"28,80\",,[],0\r\n10,Docker Registry Pipeline in .IO...[rest of message removed]"
"@version" => "1",
     "tags" => [
        [0] "multiline",
        [1] "_csvparsefailure"
      ],
          "path" => "/root/new-taiga-stories",
          "host" => "[redacted]"

The only difference that I noticed between my test input and the actual file was that line endings in the latter were Windows (\r\n), which leads me to believe the file input plugin isn't recognizing the line endings. There's a short thread on this same issue located at \r\n as a row separator, but no resolution. Is there any way to define which line endings to break on, or any other suggestions on how to resolve this through my conf file?

Thanks!


(tdsacilowski) #2

Just an update... I figured out why my original config was failing with the Windows line endings. Apparently I missed the fact that the end-of-line regex anchor '$' only looks for '\n'. My updated pattern:

"^ref,|%{TIMESTAMP_ISO8601}(?:,[^,]*){7},(?:\"[\d,]+\"){0,1},[^,]*,\[.*\],\d+\r{0,1}$"


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.