How to get Grok Filter to see Newline and Carriage Returns?

jeremiah.adams · December 10, 2015, 3:39pm

I originally posted this over in stack overflow but I think I am unlikely to get an answer there so am posting here.

I am trying to parse our log files and send them to elasticsearch. The problem is that our S3 client is injecting lines into the file that contains carriage returns (\r) instead of new line chars (\n). The config for the File Input Filter using '\n' as the delimiter which is consistent with 99% of the data. When I run logstash against this data, it misses the last line which is what I am really looking for. This is because the File Input Filter is treating the '\r' characters as normal text and not new line. To get around this I am trying to use a Mutate Filter to rewrite the '\r' chars to '\n'. The mutate works, but Grok still sees it as one big line. and _grokparsefailure.

I expect to toss out the lines with the '\r' files and only parse the lines that look like a normal log4j entry. Problem is the key line I need is munged in with the '\r' garbage an Mutate filter is not causing the new '\n' characters to be re-evaluated.

Config

input {
    file {
        path => "/home/pa_stg/runs/2015-12-09-cron-1449666001/run.log"
        start_position => "beginning"
        sincedb_path => "/data/logstash/sincedb"
        stat_interval => 300
        type => "spark"
    }
}
filter{
    mutate {
         gsub => ["message", "\r", "
         "]
     }
     grok {
         match => {"message" => "\A%{DATE:date} %{TIME:time} %{LOGLEVEL:loglevel} %{SYSLOGPROG}%       {GREEDYDATA:data}"}
         break_on_match => false
     }
}
output{
    stdout { codec => rubydebug }
}

##Input
This sample from the input file illustrates the problem. The ^M characters are how vim displays the '\r' Carriage Returns ('more' hides most of them). I left the line as is so you can see that the whole thing is seen in linux and the File Plugin as a single line of text. I am trimming this input due to size limitations of the forum.

^M[Stage 79:=======>                                               (30 + 8) / 208]^M[Stage 79:============>                                          (49 + 8) / 208]^M[Stage 79:=================>                                     (65 + 8) / 208]^M[Stage 93:================================================>     (186 + 6) / 208]^M[Stage 93:=====================================================>(206 + 2) / 208]^M                                                                                ^M15/12/09 13:03:46 INFO SomethingProcessor$: Something Processor completed
15/12/09 13:04:44 INFO CassandraConnector: Disconnected from Cassandra cluster: int

##Output
Apologies for the formatting but it is butchered in the output as well. Key being that "message" here should only include the "15/12/09 13:03:46 INFO SomethingProcessor$: Something Processor completed" line. I am trimming most of the output due to size limitation of the forum.

{
   "message" => "\n[Stage 79:=======>                                               (30 + 8) / 208]\n[Stage 79:============>
                         (49 + 8) / 208]\n[Stage 79:=================>                                     (65 + 8) / 208]\n[Stage 93:=====================================================>(206 + 2) / 208]\n
                                                             \n15/12/09 13:03:46 INFO SomethingProcessor$: Something Processor com
pleted",
        "@version" => "1",
        "@timestamp" => "2015-12-09T22:16:52.898Z",
        "host" => "ip-10-252-1-225",
        "path" => "/home/something/pa_stg/runs/2015-12-09-cron-1449666001/run.log",
        "type" => "spark",
        "tags" => [
        [0] "_grokparsefailure"
    ]
}

cschotke · December 10, 2015, 5:20pm

Have you tried using the split filter? https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html

I think you should be able to split the lines on '\r' and then each line would be re-processed as a separate event.

Topic		Replies	Views
Ignore Text and carriage return characters Logstash	6	10890	June 8, 2017
How do I match a newline in grok/logstash Logstash	13	16491	July 6, 2017
Parsing failed to due to \r\n in logstash Logstash	2	744	July 6, 2017
Ignoring text and carriage return characters with Grok Logstash	2	1680	December 12, 2016
Error messages from elastic search with \r\n instead of actual new lines Logstash	1	358	July 18, 2019

How to get Grok Filter to see Newline and Carriage Returns?

Config

Related topics