_grokparsefailure on file parsing


#1

Hi,

I have a multi line log file structured something like;

<Aug 02, 2016 11:21:05:049 AM> <dataa> <datab> <datac> <datad> <datae> <dataf> <datag>
 <datah>

I have set multiline pattern in filebeat like;

multiline.pattern: '^<[A-Za-z_]{3} [[:digit:]]{2}, [[:digit:]]{4} ([[:digit:]]{1}|[[:digit:]]{2}):[[:digit:]]{2}:[[:digit:]]{2}:([[:digit:]]{1}|[[:digit:]]{2}|[[:digit:]]{3}) [A-Z]{2}>'
multiline.negate: true
multiline.match: after
multiline.max_lines: 5000

This pushes data to kafka and then logstash consumes from it. The problem is, filebeat is using \u003c and \u003e instead of < and > in the message, which makes a _grokparsefailure in my logstash. Encoding is utf8 encoding: utf-8.

How can I fix this?


(Magnus Bäck) #2

It seems unlikely that Logstash's JSON deserializer wouldn't translate \u003c to <. Please show your grok filter and what a failed events looks like. Use a stdout { codec => rubydebug } output and copy/paste its output.


#3
grok {
	match => { "message" => "<(?<timestamp>%{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM))\> <%{GREEDYDATA:dataa}> <%{GREEDYDATA:datab}> <%{GREEDYDATA:datac}> <%{GREEDYDATA:datad}> <%{GREEDYDATA:datae}> <%{GREEDYDATA:dataf}> <%{GREEDYDATA:datag}>\n <%{GREEDYDATA:datah}>" }
}

and the output message you can think like a text with \u003c and \u003einstead of < and >. It was working fine with version 2, but just now I switched to v5 and testing on it.


#4

Its not a problem with logstash I think, because in the filebeat log also, its \u003c and \u003e


(ruflin) #5

Sounds like https://github.com/elastic/beats/issues/2581 ?


(Magnus Bäck) #6

In JSON, \u003c and < are equivalent so it's totally fine for Filebeat to use \u003c instead of <. At least Logstash 2.4 handles this just fine:

$ cat test.config 
input { stdin { codec => json } }
output { stdout { codec => rubydebug } }
$ echo '{"message": "\u003cfoo\u003e"}' | /opt/logstash/bin/logstash -f test.config
Settings: Default pipeline workers: 8
Pipeline main started
{
       "message" => "<foo>",
      "@version" => "1",
    "@timestamp" => "2016-11-03T09:51:09.244Z",
          "host" => "lnxolofon"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

See also:


#8

I have tried with just console output and this is what I got;

{
  "@timestamp": "2016-11-03T11:19:52.393Z",
  "beat": {
    "hostname": "localhost",
    "name": "localhost",
    "version": "5.0.0"
  },
  "input_type": "log",
  "message": "\u003cNov 02, 2016 10:49:42:810 AM\u003e \u003cdataa\u003e \u003cdatab\u003e \u003cdatac\u003e \u003cdatad\u003e \u003cdatae\u003e \u003cdataf\u003e \u003cdatag\u003e\n \u003cdatah\n\u003e",
  "offset": 95625,
  "source": "/logfiles/logfile.log",
  "type": "log"
}

(ruflin) #9

Is there a reason you also opened Filebeat error. Returns unicode character code instead of symbol ?


(system) #10

This topic was automatically closed after 21 days. New replies are no longer allowed.