_grokparsefailure inspite of working well with grok debugger

I am trying to send logs to elasticsearch using logstash. Following is the logstash config file -

input {
        beats {
            port => "5044"
        }
    }
filter {
  grok {
    patterns_dir => ["/etc/logstash/conf.d/patterns"]
    match => {
      "message" => "%{DATA:remoteaddr},%{DATA:remoteuser},%{DATA:datetimeoflog},<<<<%{DATA:httpmethod} %{DATA:requestcontents} %{DATA:httpversion}>>>>,%{DATA:requeststatus},%{DATA:bodysizeinbytes},<<<<%{DATA:httprefferer}>>>>,<<<<%{DATA:useragent}>>>>,%{DATA:requesttime},%{DATA:upstreamconnecttime},%{DATA:upstreamheadertime},%{DATA:upstreamresponsetime}"
      }
   }
   date {
     match => [ "datetimeoflog", "dd/MMM/yyyy:HH:mm:ss Z" ]
     target => "@timestamp"
     timezone => "Asia/Kolkata"
    }
}
output {
     elasticsearch {
     hosts => [ "localhost:9200" ]
     index =>  "mammoth-%{[nameofindex]}"
     }
  }

And the pattern I am using are :-

DATA (.*?)

Corresponding to the log statement of the input file:-

172.25.2.1,-,06/Apr/2019:14:24:41 +0530,<<<<POST /api/v1/weburls HTTP/1.1>>>>,200,95,<<<<http://localhost:3000/>>>>,<<<<Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36>>>>,1.223,0.000,1.216,1.216

The message from logstash is as follows :-

DEBUG] 2019-04-06 08:54:43.964 [[main]>worker4] pipeline - output received {"event"=>{"@version"=>"1", "typeoflog"=>"nginx_access", "source"=>"/home/vagrant/logs/nginx/mycode_app_access.log", "host"=>{"name"=>"ubuntu-bionic", "containerized"=>false, "id"=>"aa4c8e0e252d4e4687a49b82a0798def", "architecture"=>"x86_64", "os"=>{"family"=>"debian", "name"=>"Ubuntu", "codename"=>"bionic", "platform"=>"ubuntu", "version"=>"18.04.1 LTS (Bionic Beaver)"}}, "nameofindex"=>"nginx_wip", "log"=>{"file"=>{"path"=>"/home/vagrant/logs/nginx/mycode_app_access.log"}}, "message"=>"172.25.2.1,-,06/Apr/2019:14:24:41 +0530,<<<<POST /api/v1/weburls HTTP/1.1>>>>,200,95,<<<<http://localhost:3000/>>>>,<<<<Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36>>>>,1.223,0.000,1.216,1.216", "input"=>{"type"=>"log"}, "tags"=>["vagrant_nginx_access", "beats_input_codec_plain_applied", "_grokparsefailure"], "offset"=>28999, "@timestamp"=>2019-04-06T08:54:42.748Z, "beat"=>{"name"=>"ubuntu-bionic", "version"=>"6.6.2", "hostname"=>"ubuntu-bionic"}, "prospector"=>{"type"=>"log"}}}

Working fine in debugger

Doesn't look like you're using any custom grok patterns. Not sure if that's conflicting somehow, comment it out and see if it works.

I tried that also but effect. Can you suggest anything else

I have never seen this log format with <<<< and >>>>. Even though the parse works in the test, maybe there's something internal messing it up. For instance, I have some patterns like this in my config that contain angle brackets:

(?<function>\S*)

Perhaps try simplifying your regex with <+ and >+ instead of <<<<. And try escaping them too \<+ and \>+.

Ah - try this pattern with a $ on the end:

%{DATA:remoteaddr},%{DATA:remoteuser},%{DATA:datetimeoflog},<<<<%{DATA:httpmethod} %{DATA:requestcontents} %{DATA:httpversion}>>>>,%{DATA:requeststatus},%{DATA:bodysizeinbytes},<<<<%{DATA:httprefferer}>>>>,<<<<%{DATA:useragent}>>>>,%{DATA:requesttime},%{DATA:upstreamconnecttime},%{DATA:upstreamheadertime},%{DATA:upstreamresponsetime}$

None of the above helped

Well, the problem is, the last field, upstreamresponsetime isn't getting matched. You need to change your grok to ensure it gets matched.

input { generator { count => 1 message => '172.25.2.1,-,06/Apr/2019:14:24:41 +0530,<<<<POST /api/v1/weburls HTTP/1.1>>>>,200,95,<<<<http://localhost:3000/>>>>,<<<<Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36>>>>,1.223,0.000,1.216,1.216' } }

filter {
    grok { match => { "message" => "%{DATA:remoteaddr},%{DATA:remoteuser},%{DATA:datetimeoflog},<<<<%{DATA:httpmethod} %{DATA:requestcontents} %{DATA:httpversion}>>>>,%{DATA:requeststatus},%{DATA:bodysizeinbytes},<<<<%{DATA:httprefferer}>>>>,<<<<%{DATA:useragent}>>>>,%{DATA:requesttime},%{DATA:upstreamconnecttime},%{DATA:upstreamheadertime},%{DATA:upstreamresponsetime}" } pattern_definitions => { "DATA" => "(.*?)" } }
}

works just fine for me...

         "remoteaddr" => "172.25.2.1",
      "datetimeoflog" => "06/Apr/2019:14:24:41 +0530",
       "httprefferer" => "http://localhost:3000/",
         "httpmethod" => "POST",

etc.

Did it match the last field ok?

No. I do not get upstreamresponsetime unless I add the trailing $ (which I think is really strange), but I get all the other fields and I do not get a _grokparsefailure.

Personally I would be using custom patterns instead of DATA: (?<remoteaddr>[^,]*) etc. And anchoring the pattern with ^ at the start would make it more efficient.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.