_grokparsefailure on file parsing

elasticheart · November 3, 2016, 8:30am

Hi,

I have a multi line log file structured something like;

<Aug 02, 2016 11:21:05:049 AM> <dataa> <datab> <datac> <datad> <datae> <dataf> <datag>
 <datah>

I have set multiline pattern in filebeat like;

multiline.pattern: '^<[A-Za-z_]{3} [[:digit:]]{2}, [[:digit:]]{4} ([[:digit:]]{1}|[[:digit:]]{2}):[[:digit:]]{2}:[[:digit:]]{2}:([[:digit:]]{1}|[[:digit:]]{2}|[[:digit:]]{3}) [A-Z]{2}>'
multiline.negate: true
multiline.match: after
multiline.max_lines: 5000

This pushes data to kafka and then logstash consumes from it. The problem is, filebeat is using \u003c and \u003e instead of < and > in the message, which makes a _grokparsefailure in my logstash. Encoding is utf8 encoding: utf-8.

How can I fix this?

magnusbaeck · November 3, 2016, 8:34am

It seems unlikely that Logstash's JSON deserializer wouldn't translate \u003c to <. Please show your grok filter and what a failed events looks like. Use a stdout { codec => rubydebug } output and copy/paste its output.

elasticheart · November 3, 2016, 8:39am

grok {
	match => { "message" => "<(?<timestamp>%{MONTH} %{MONTHDAY}, 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) (?:AM|PM))\> <%{GREEDYDATA:dataa}> <%{GREEDYDATA:datab}> <%{GREEDYDATA:datac}> <%{GREEDYDATA:datad}> <%{GREEDYDATA:datae}> <%{GREEDYDATA:dataf}> <%{GREEDYDATA:datag}>\n <%{GREEDYDATA:datah}>" }
}

and the output message you can think like a text with \u003c and \u003einstead of < and >. It was working fine with version 2, but just now I switched to v5 and testing on it.

elasticheart · November 3, 2016, 8:45am

Its not a problem with logstash I think, because in the filebeat log also, its \u003c and \u003e

ruflin · November 3, 2016, 9:49am

Sounds like https://github.com/elastic/beats/issues/2581 ?

magnusbaeck · November 3, 2016, 9:51am

In JSON, \u003c and < are equivalent so it's totally fine for Filebeat to use \u003c instead of <. At least Logstash 2.4 handles this just fine:

$ cat test.config 
input { stdin { codec => json } }
output { stdout { codec => rubydebug } }
$ echo '{"message": "\u003cfoo\u003e"}' | /opt/logstash/bin/logstash -f test.config
Settings: Default pipeline workers: 8
Pipeline main started
{
       "message" => "<foo>",
      "@version" => "1",
    "@timestamp" => "2016-11-03T09:51:09.244Z",
          "host" => "lnxolofon"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

See also:

elasticheart · November 3, 2016, 11:39am

I have tried with just console output and this is what I got;

{
  "@timestamp": "2016-11-03T11:19:52.393Z",
  "beat": {
    "hostname": "localhost",
    "name": "localhost",
    "version": "5.0.0"
  },
  "input_type": "log",
  "message": "\u003cNov 02, 2016 10:49:42:810 AM\u003e \u003cdataa\u003e \u003cdatab\u003e \u003cdatac\u003e \u003cdatad\u003e \u003cdatae\u003e \u003cdataf\u003e \u003cdatag\u003e\n \u003cdatah\n\u003e",
  "offset": 95625,
  "source": "/logfiles/logfile.log",
  "type": "log"
}

ruflin · November 3, 2016, 2:59pm

Is there a reason you also opened Filebeat error. Returns unicode character code instead of symbol ?

system · November 24, 2016, 8:31am

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Grokparsefailure in logstash when consuming beats Logstash	2	1001	October 17, 2017
Filebeat error. Returns unicode character code instead of symbol Beats filebeat	5	3540	November 24, 2016
Multiline codec in filebeats Beats filebeat	3	1140	August 29, 2017
Filebeat multiline codec not working in my case Beats filebeat	3	3836	June 6, 2017
Multiline codec in Filebeat is not working Beats filebeat	5	1449	August 7, 2017

_grokparsefailure on file parsing

Related topics