Correct grok pattern?

Hello,

I have the following grok pattern:

(?\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) [%{DATA:err_severity}] (%{NUMBER:pid:int}#%{NUMBER}: *%{NUMBER}|*%{NUMBER}) %{DATA:err_message}(?:, client: (?<client_ip>%{IP}|%{HOSTNAME}))(?:, server: %{GREEDYDATA:server})(?:, request: %{DATA:request}) HTTP/%{NUMBER:http_version}

in my logstash.conf which will rearrange error messages from nginx.

This is my log message:

       "message" => "2016/02/05 17:00:08 [error] 23778#23778: *36414 app._default.company.com could not be resolved (3: Host not found), client: 12.1.12.123, server: waf._default.company.com, request: \"GET / HTTP/1.1\", host: \"11.1.111.111\"",
      "@version" => "1",
    "@timestamp" => "2016-02-05T17:00:08.000Z",
          "type" => "nginx",
          "tags" => [
    [0] "syslog",
    [1] "nginx",
    [2] "nginx_error"
],
          "host" => "11.1.111.111",
      "priority" => 187,
     "timestamp" => [
    [0] "Feb  5 17:00:08",
    [1] "2016/02/05 17:00:08"
],
     "logsource" => "ip-11-1-111-111",
       "program" => "nginx",
      "severity" => 3,
      "facility" => 23,
"facility_label" => "local7",
"severity_label" => "Error",
   "proxy_layer" => "nginx",
  "err_severity" => "error",
           "pid" => 23778,
   "err_message" => "app._default.company.com could not be resolved (3: Host not found)",
     "client_ip" => "12.1.12.123",
       "request" => "\"GET /",
  "http_version" => "1.1"

}

2 questions:

  1. I can't find a grok pattern that will fit the last part of this log message (host: "11.1.111.111" ) . When I look in my log file I can see that this last part is reported as host but when i put this pattern into grokdebugger, the last part doesn't appear. How can i make sure that this pattern captures all of the log message?

  2. I used this grok pattern for the server entry (?:, server: %{GREEDYDATA:server}). Is that correct?

Many thanks

I don't understand how the host field is created since it's not captured by your grok expression (and what happened to the server field?). I suggest you change the last part of the expression to this (untested):

(?:, request: "%{WORD:http_verb} %{NOTSPACE:request}) HTTP/%{NUMBER:http_version}", host: "(?<host>[^"]+)"

Notes:

  • I don't know why you have parenthesized parts of the expression ((?:...) etc). AFAICT it just adds noise and makes the expression harder to read.
  • I refactored how the HTTP verb and request is extracted and separating those tokens into separate fields.
  • I'd look into using the kv filter since the second half of the message is a comma-separated list of key/value pairs. I'm just not sure the kv filter deals with double-quoted values which is necessary to cope with e.g. URLs containing commas.

What you currently use to extract the server field should technically work, but I suggest you use what I suggested above to extract the host field. I don't recommend using DATA or GREEDYDATA in the middle of expressions even if it in many cases works fine. Dropping GREEDYDATA should also improve performance slightly.

Hello,

Thanks for your help.

When I tried your suggestion:

(?:, request: "%{WORD:http_verb} %{NOTSPACE:request}) HTTP/%{NUMBER:http_version}", host: "(?[^"]+)"

for the last part of my expression but the response was that it didn't match.

Even when I tried to escape the comma as seen here

..."," host: "(?[^"]+)"

the response was it didn't match.

I tried to use a regex editor (rubular) to test out different variations e.g. host: "(?[^\d."])", host: ""%{NUMBER:host}"" but all failed.

I did ultimately use your suggestion of the kv filter in the following way:-
grok {
...
}
kv {
field_split => "host"
}

and the host was printed out correctly in the logstash.log file. However I would really like to match the grok pattern to the event without resorting to a kv filter. Is that possible?

Also another question:

I have developed another grok pattern that completely matches an event however when I use this pattern in logstash config file, restart my logstash service and check my logstash.log for the event, I can see a tag named "_grokparsefailure" against it. In what circumstances would you have a perfectly matched grok pattern but have it fail against an event?

I've just started learning about grokking in logstash so still quite new to this.

Many thanks!

When I tried your suggestion:

(?:, request: "%{WORD:http_verb} %{NOTSPACE:request}) HTTP/%{NUMBER:http_version}", host: "(?[^"]+)"

for the last part of my expression but the response was that it didn't match.

Please copy/paste your exact configuration and format it as code with the toolbar button.

kv {
field_split => "host"
}

No, that's not what I meant. I meant that you should use the kv filter for parsing the whole key/value part of the string, i.e. this part:

client: 12.1.12.123, server: waf._default.company.com, request: "GET / HTTP/1.1", host: "11.1.111.111",

A kv filter similar to this should be able to make sense of this:

kv {
  field_split => ", "
  value_split => ": "
}

I have developed another grok pattern that completely matches an event however when I use this pattern in logstash config file, restart my logstash service and check my logstash.log for the event, I can see a tag named "_grokparsefailure" against it. In what circumstances would you have a perfectly matched grok pattern but have it fail against an event?

I'm not into hypothetical questions. Like above, show us the exact configuration and an example event. Make sure the same event isn't processed by multiple grok filters.