_grokparsefailure for Apache access logs, working fine without fluentbit, works fine on grokdebugger

Hello,
I'm getting _grokparsefailure while parsing access logs using FluentBit, I'm able to parse it usign grok debugger.

sample logs

22.244.133.97 : 22.244.133.97 - - [09/Apr/2019:11:04:15 +0000] GET /app/include/style.css HTTP/1.1 200 6575 https://dev01-app-module-dev.roanprd-openshift.intra.absa.co.za/app/logoutNoFrames.do Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36 FHOnORPsyksg9FoyBiCUM92k 0.002
22.244.133.97 : 22.244.133.97 - vaibhav@managedmodule.india.company.org [09/Apr/2019:10:53:48 +0000] POST /app/secure/home.do HTTP/1.1 200 12647 https://dev01-app-module-dev.roanprd-openshift.intra.absa.co.za/app/login.jsp Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36 ZfQaQznAYIx0NA-HZ4hNkx9d 4.645
config file

filter {
if [application] == "access" {
grok {
break_on_match => false
match => {
"message" => ["(?:%{IP}|%{HOSTNAME}) : (?:%{IP}|%{HOSTNAME})\s*[-/]\s*(?:(?[a-zA-Z][a-zA-Z0-9_.+-=:]+@%{HOSTNAME})|-)\s*[%{MONTHDAY:[@metadata][day]}/%{MONTH:[@metadata][month]}/%{YEAR:[@metadata][year]}[:\s\w+]*]\s"]
}
}
#Convert Textual Month to
ruby {
"code" => "event.set('[@metadata][month]',Date::ABBR_MONTHNAMES.index(event.get('[@metadata][month]')));"
}
}
}

ruby output
{
"date" => 1554807229.693926,
"image" => "module-app",
"headers" => {
"content_type" => "application/json",
"request_path" => "/",
"http_version" => "HTTP/1.1",
"request_method" => "POST",
"https" => "https",
"http_host" => "zaddnnapp0004.dcorp.msarena.com:9443",
"content_length" => "1086",
"request_uri" => "/"
},
"instance" => "module-app-dev01-5-6bjll",
"log" => "22.244.133.97 : 22.244.133.97 - vaibhav@managedmodule.india.company.org [09/Apr/2019:10:53:48 +0000] POST /app/secure/home.do HTTP/1.1 200 12647 https://dev01-app-module-dev.roanprd-openshift.intra.absa.co.za/app/login.jsp Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36 ZfQaQznAYIx0NA-HZ4hNkx9d 4.645",
"@metadata" => {
"year" => "unknown",
"month" => 0,
"day" => "unknown"
},
"log_level" => "INFO",
"tags" => [
[0] "_grokparsefailure"
],
"path" => "/mnt/logs/default-host/access_log_2019-04-09",
"environment" => "dev01",
"@timestamp" => 2019-04-09T10:54:06.057Z,
"filename" => "access_log_2019-04-09",
"application" => "access",
"host" => "22.245.242.190",
"@version" => "1",
"group" => "web"
}
Fluent Bit conf
[SERVICE]
Flush 1
Daemon Off
Log_Level ${FLUENT_LOGLEVEL}
Parsers_file myapp.parser

[INPUT]
Name tail
Path /mnt/logs/*.log
Path_Key LogFile
Multiline On
Parser_Firstline myapp_multiline_firstline

[INPUT]
Name tail
Path /mnt/logs/**/_log
Path_Key LogFile
Multiline Off

[FILTER]
Name record_modifier
Match *
Record instance {HOSTNAME} Record environment {ENVIRONMENT}
Record image ${APPLICATION}

[OUTPUT]
Name http
Match *
Port {LOG_COLLECTION_PORT} Host {LOG_COLLECTION_HOST}
Format json

tried diffent grok such as
match => {  
    "message" => ["%{IPORHOST:clientip} : %{IPORHOST:hostname}\s*[-/]\s*(?:%{HTTPDUSER:remoteuser}|-)\s*\[%{HTTPDATE:timestamp}\]\s*(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\s*%{NUMBER:responsecode}\s*(?:%{NUMBER:bytes}|-)\s*",
                  "%{IPORHOST} : %{IPORHOST}\s*[-/]\s*(?:%{HTTPDUSER}|-)\s*\[%{MONTHDAY:[@metadata][day]}/%{MONTH:[@metadata][month]}/%{YEAR:[@metadata][year]}[:\s\w\+]*\]\s"]
  }

used below websites to debug
http://grokconstructor.appspot.com/do/match#result
http://grokdebug.herokuapp.com/

That is not a valid regexp. The following works

match => { "message" => ["(?:%{IP}|%{HOSTNAME}) : (?:%{IP}|%{HOSTNAME})\s*[-/]\s*(?:(?:[a-zA-Z][a-zA-Z0-9_.+-=:]+@%{HOSTNAME})|-)\s*\[%{MONTHDAY:[@metadata][day]}/%{MONTH:[@metadata][month]}/%{YEAR:[@metadata][year]}[:\s\w+]*\]\s"] }

I tried the provided GROK, still it is failing with _grokparsefailure error.

Then you message is not as it appears. Try block-quoting it. This

input { generator { count => 1 message => '22.244.133.97 : 22.244.133.97 - - [09/Apr/2019:11:04:15 +0000] GET /app/include/style.css HTTP/1.1 200 6575 https://dev01-app-module-dev.roanprd-openshift.intra.absa.co.za/app/logoutNoFrames.do Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36 FHOnORPsyksg9FoyBiCUM92k 0.002' } }

filter {
    grok { match => { "message" => ["(?:%{IP}|%{HOSTNAME}) : (?:%{IP}|%{HOSTNAME})\s*[-/]\s*(?:(?:[a-zA-Z][a-zA-Z0-9_.+-=:]+@%{HOSTNAME})|-)\s*\[%{MONTHDAY:[@metadata][day]}/%{MONTH:[@metadata][month]}/%{YEAR:[@metadata][year]}[:\s\w+]*\]\s"] } }
}

gets me

 "@metadata" => {
      "day" => "09",
    "month" => "Apr",
     "year" => "2019"
},

Something is seriously wrong.

I tried to add the GROK piece by piece

filter {
if [application] == "access" {
grok {
break_on_match => false
match => {
"message" => ["(?:%{IP}|%{HOSTNAME})"] } }
} }
This is also giving _grokparsefailure error.

I have access log configured as
<access-log pattern="%a : %h %l %u %t %r %s %b %{Referer}i %{User-Agent}i %S %T" prefix="access_log_"/>

SOLVED.
It was not something with GROK.
In the fluentBit configuration Key message was missing. Thanks to James F from My team who pointed this to me.

[INPUT]
Name tail
Path /mnt/logs/**/ _log
Path_Key LogFile
Multiline Off
Key message

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.