Ignore groups that are not in the log

Hello.
I hope for your help.

I need to parse the nginx-error-log, but the problem is that this or that group in the log may be absent and then no match occurs.

Here's an example.
2020/08/11 14:16:35 [warn] 230521#230521: *573543 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676, client: 172.31.41.113, server: _, request: "POST /taps/api/v1/notification/1151020/2018/submit HTTP/1.1", host: "lc.com", referrer: "http://lc/FL?cardId=902713&step=deductionSelector"

Pattern
(?%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) [%{LOGLEVEL:severity}] %{NUMBER:pid}#%{NUMBER:threadid}: *%{NUMBER:connectionid} %{DATA:errormessage}, client: %{IP:client}, server: %{DATA:server}, request: "%{DATA:verb} %{DATA:request} %{DATA:httpversion}", upstream: "%{DATA:upstream}", host: "%{DATA:host}", referrer: "%{DATA:referrer}"

you can see that the log is missing a group for ", upstream: "%{DATA:upstream}""

also noticed that the log may be missing, for example, records about other groups.

How to select these groups so that they are ignored if absent, but correctly parsed if present?

I tried this approach but it doesn't seem to work in values null
(?%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) [%{LOGLEVEL:severity}] %{NUMBER:pid}#%{NUMBER:threadid}: *%{NUMBER:connectionid} %{DATA:errormessage}, client: %{IP:client}, server: %{DATA:server}, request: "%{DATA:verb} %{DATA:request}(?: %{DATA:httpversion}")?(?:, upstream: "%{DATA:upstream}")?(?:, host: "%{DATA:host}")?(?:, referrer: "%{DATA:referrer}")?

http://grokdebug.herokuapp.com/

 {
  "timestamp": [
    [
      "2020/08/11 14:16:35"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "08"
    ]
  ],
  "MONTHDAY": [
    [
      "11"
    ]
  ],
  "TIME": [
    [
      "14:16:35"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "16"
    ]
  ],
  "SECOND": [
    [
      "35"
    ]
  ],
  "severity": [
    [
      "warn"
    ]
  ],
  "pid": [
    [
      "230521"
    ]
  ],
  "BASE10NUM": [
    [
      "230521",
      "230521",
      "573543"
    ]
  ],
  "threadid": [
    [
      "230521"
    ]
  ],
  "connectionid": [
    [
      "573543"
    ]
  ],
  "errormessage": [
    [
      "a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676"
    ]
  ],
  "client": [
    [
      "172.31.41.113"
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      "172.31.41.113"
    ]
  ],
  "server": [
    [
      "_"
    ]
  ],
  "verb": [
    [
      "POST"
    ]
  ],
  "request": [
    [
      ""
    ]
  ],
  "httpversion": [
    [
      null
    ]
  ],
  "upstream": [
    [
      null
    ]
  ],
  "host": [
    [
      null
    ]
  ],
  "referrer": [
    [
      null
    ]
  ]
}

Maybe you tell me a universal pattern for nginx-error log?

What do you not like about the results you get with that?

2020/08/11 14:16:35 [warn] 230521#230521: *573543 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676, client: 172.31.41.113, server: _, request: "POST /taps/api/v1/notification/1151020/2018/submit HTTP/1.1", host: "lc.com", referrer: "http://lc/FL?cardId=902713&step=deductionSelector"

The parsing is unsuccessful if I highlight something in this way before (?:, upstream: "%{DATA:upstream}")?

for example
(?%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) [%{LOGLEVEL:severity}] %{POSINT:pid}#%{NUMBER:threadid}: *%{NUMBER:connectionid} %{DATA:errormessage}, client: %{IP:client}, server: %{DATA:server}, request: "%{DATA:verb} %{DATA:request} %{DATA:httpversion}"(?:, upstream: "%{DATA:upstream}")?(?:, host: "%{DATA:host}")?(?:, referrer: "%{DATA:referrer}")?

in this form it works if I remove one of the groups in the log
, host: "lc.com", referrer: "http://lc/FL?cardId=902713&step=deductionSelector"

but if I start to isolate
(?: %{DATA:httpversion}")? or (?:, request: "%{DATA:verb} %{DATA:request} %{DATA:httpversion}")?

then the parsing fails

{
  "timestamp": [
    [
      "2020/08/11 14:16:35"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "08"
    ]
  ],
  "MONTHDAY": [
    [
      "11"
    ]
  ],
  "TIME": [
    [
      "14:16:35"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "16"
    ]
  ],
  "SECOND": [
    [
      "35"
    ]
  ],
  "severity": [
    [
      "warn"
    ]
  ],
  "pid": [
    [
      "230521"
    ]
  ],
  "threadid": [
    [
      "230521"
    ]
  ],
  "BASE10NUM": [
    [
      "230521",
      "573543"
    ]
  ],
  "connectionid": [
    [
      "573543"
    ]
  ],
  "errormessage": [
    [
      "a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676"
    ]
  ],
  "client": [
    [
      "172.31.41.113"
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      "172.31.41.113"
    ]
  ],
  "server": [
    [
      "_"
    ]
  ],
  "verb": [
    [
      "POST"
    ]
  ],
  "request": [
    [
      ""
    ]
  ],
  "httpversion": [
    [
      null
    ]
  ],
  "upstream": [
    [
      null
    ]
  ],
  "host": [
    [
      null
    ]
  ],
  "referrer": [
    [
      null
    ]
  ]
}

or

{
  "timestamp": [
    [
      "2020/08/11 14:16:35"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "08"
    ]
  ],
  "MONTHDAY": [
    [
      "11"
    ]
  ],
  "TIME": [
    [
      "14:16:35"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "16"
    ]
  ],
  "SECOND": [
    [
      "35"
    ]
  ],
  "severity": [
    [
      "warn"
    ]
  ],
  "pid": [
    [
      "230521"
    ]
  ],
  "threadid": [
    [
      "230521"
    ]
  ],
  "BASE10NUM": [
    [
      "230521",
      "573543"
    ]
  ],
  "connectionid": [
    [
      "573543"
    ]
  ],
  "errormessage": [
    [
      "a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676"
    ]
  ],
  "client": [
    [
      "172.31.41.113"
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      "172.31.41.113"
    ]
  ],
  "server": [
    [
      ""
    ]
  ],
  "verb": [
    [
      null
    ]
  ],
  "request": [
    [
      null
    ]
  ],
  "httpversion": [
    [
      null
    ]
  ],
  "upstream": [
    [
      null
    ]
  ],
  "host": [
    [
      null
    ]
  ],
  "referrer": [
    [
      null
    ]
  ]
}

I do not understand what you mean by highlight or isolate. Please provide the pattern you are using and an example of message that do and do not parse correctly. Select the text of the pattern and messages and click on </> in the toolbar above the text edit panel so that they are formatted correctly.

Pattern
(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{DATA:errormessage}, client: %{IP:client}, server: %{DATA:server_nginx}, request: "%{DATA:verb} %{DATA:request} %{DATA:httpversion}"(?:, upstream: "%{DATA:upstream}")?(?:, host: "%{DATA:host}")?(?:, referrer: "%{DATA:referrer}")?`
[details="Summary"]
Log
2020/08/11 14:16:35 [warn] 230521#230521: *573543 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676, client: 172.31.41.113, server: _, request: "POST /taps/api/v1/notification/1151020/2018/submit HTTP/1.1", host: "lc.com", referrer: "http://lc/FL?cardId=902713&step=deductionSelector"

if i try to change the pattern like this

(?: %{DATA:httpversion}")? or (?:, request: "%{DATA:verb} %{DATA:request} %{DATA:httpversion}")?

in this case the value is lost (null)

  "request": [
    [
      ""
    ]
  ],
  "httpversion": [
    [
      null
    ]
  ],
  "upstream": [
    [
      null
    ]
  ],
  "host": [
    [
      null
    ]
  ],
  "referrer": [
    [
      null
    ]
  ]
}

What is the complete pattern once you have changed it?

(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{DATA:errormessage}, client: %{IP:client}, server: %{DATA:server_nginx}, request: "%{DATA:verb} %{DATA:request}(?: %{DATA:httpversion}")?(?:, upstream: "%{DATA:upstream}")?(?:, host: "%{DATA:host}")?(?:, referrer: "%{DATA:referrer}")?`

[details="Summary"]

or

(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{DATA:errormessage}, client: %{IP:client}, server: %{DATA:server_nginx}(?:, request: "%{DATA:verb} %{DATA:request} %{DATA:httpversion}")?(?:, upstream: "%{DATA:upstream}")?(?:, host: "%{DATA:host}")?(?:, referrer: "%{DATA:referrer}")?`

[details="Summary"]

result

{
  "timestamp": [
    [
      "2020/08/11 14:16:35"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "08"
    ]
  ],
  "MONTHDAY": [
    [
      "11"
    ]
  ],
  "TIME": [
    [
      "14:16:35"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "16"
    ]
  ],
  "SECOND": [
    [
      "35"
    ]
  ],
  "severity": [
    [
      "warn"
    ]
  ],
  "pid": [
    [
      "230521"
    ]
  ],
  "threadid": [
    [
      "230521"
    ]
  ],
  "BASE10NUM": [
    [
      "230521",
      "573543"
    ]
  ],
  "connectionid": [
    [
      "573543"
    ]
  ],
  "errormessage": [
    [
      "a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676"
    ]
  ],
  "client": [
    [
      "172.31.41.113"
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      "172.31.41.113"
    ]
  ],
  "server_nginx": [
    [
      "_"
    ]
  ],
  "verb": [
    [
      "POST"
    ]
  ],
  "request": [
    [
      ""
    ]
  ],
  "httpversion": [
    [
      null
    ]
  ],
  "upstream": [
    [
      null
    ]
  ],
  "host": [
    [
      null
    ]
  ],
  "referrer": [
    [
      null
    ]
  ]
}

and

{
  "timestamp": [
    [
      "2020/08/11 14:16:35"
    ]
  ],
  "YEAR": [
    [
      "2020"
    ]
  ],
  "MONTHNUM": [
    [
      "08"
    ]
  ],
  "MONTHDAY": [
    [
      "11"
    ]
  ],
  "TIME": [
    [
      "14:16:35"
    ]
  ],
  "HOUR": [
    [
      "14"
    ]
  ],
  "MINUTE": [
    [
      "16"
    ]
  ],
  "SECOND": [
    [
      "35"
    ]
  ],
  "severity": [
    [
      "warn"
    ]
  ],
  "pid": [
    [
      "230521"
    ]
  ],
  "threadid": [
    [
      "230521"
    ]
  ],
  "BASE10NUM": [
    [
      "230521",
      "573543"
    ]
  ],
  "connectionid": [
    [
      "573543"
    ]
  ],
  "errormessage": [
    [
      "a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000020676"
    ]
  ],
  "client": [
    [
      "172.31.41.113"
    ]
  ],
  "IPV6": [
    [
      null
    ]
  ],
  "IPV4": [
    [
      "172.31.41.113"
    ]
  ],
  "server_nginx": [
    [
      ""
    ]
  ],
  "verb": [
    [
      null
    ]
  ],
  "request": [
    [
      null
    ]
  ],
  "httpversion": [
    [
      null
    ]
  ],
  "upstream": [
    [
      null
    ]
  ],
  "host": [
    [
      null
    ]
  ],
  "referrer": [
    [
      null
    ]
  ]
}

With that pattern nothing after [verb] matches, not even [request], which actually tells you what is happening. If you add keep_empty_captures => true you will see that

"request" => "",

That is, it matched an empty string, so at that point none of the optional fields have any chance of matching. If you change %{DATA:request} to %{URIPATHPARAM:request} or even just %{NOTSPACE:request} that forces it to consume some of the message, allowing the optional fields to line up.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.