Create new field from existing field using REGEX

I have grok like that from apache logs:

grok {
match => { "message" => '%{IP:ip}\s(%{GREEDYDATA:logname})?\s(%{GREEDYDATA:remote_user})?\s[%{HTTPDATE:timestamp}]\s"%{WORD:metoda}\s%{NOTSPACE:request}\s(HTTP/%{NUMBER:httpversion}")?\s%{NUMBER:response}\s%{GREEDYDATA:bytes}\s("%{GREEDYDATA:referer}")?\s("%{GREEDYDATA:user_setup}")?\s%{GREEDYDATA:canonical}' }
remove_field => ["message"]
}

if [request] == "nagios.plc" {
drop {}
}

This is working perfectly fine.

I want to extract from request (extracted in match from message field) another two fields using REGEX.

I.E. request: /im/56/d9/17/z25006678F,Barka-hotelowa-na-Wisle.jpg
I want:
id: 25006678
Format: F
I tried something like that:

	grok {
         match => { "message" =>  '%{IP:ip}\s(%{GREEDYDATA:logname})?\s(%{GREEDYDATA:remote_user})?\s\[%{HTTPDATE:timestamp}\]\s\"%{WORD:metoda}\s%{NOTSPACE:request}\s(HTTP/%{NUMBER:httpversion}\")?\s%{NUMBER:response}\s%{GREEDYDATA:bytes}\s(\"%{GREEDYDATA:referer}\")?\s(\"%{GREEDYDATA:user_setup}\")?\s%{GREEDYDATA:canonical}' }
         remove_field => ["message"]
         }
  if [request] == "nagios.plc" {
    drop {}
  }
  grok {
        match => {"request" => "XX: (?<request>([0-9]*))" }
        match => {"request" => "Format: (?<request>(\D+).jpg") }

Then I tried to move it before removing the message in first grok and replacing first "request" with message as logstash might not know request at that time, but logs still were saying something is wrong with those matches.

Should it be one match instead 3 ?
Should I use different plugin for logstash?

You have a field that contains /im/56/d9/17/z25006678F,Barka-hotelowa-na-Wisle.jpg
and you are wondering why neither "XX: (?([0-9]*))" nor "Format: (?(\D+).jpg" match that. Is that your question?

When I am trying that regexes in https://regex101.com/ they perfectly catching those groups from request.
I do not know if:

  • my conf is proper with so many single matches
  • I should use more complex regex to catch everything before as well and everything will be OK ?

No, they are not. In the window on the right it says "Your regular expression does not match the subject string." This is why grok does not create any fields: the pattern does not match. Note the "XX: matches the characters XX: literally (case sensitive)" which is not coloured. The characters "XX: " do not occur in your string so they do not match.

If you remove the "Format: " your second pattern will match "F,Barka-hotelowa-na-Wisle.jpg". I think you want

    grok { match => { "request" => "z(?<r>[0-9]+)" } }
    grok { match => { "request" => "(?<s>\D),\D+\.jpg" } }

If you just remove the XX: then your (?<request>([0-9]*)) matches your string 43 different times, and grok would just return the first one. * means zero or more and
in "/im/56/d9/17/z25006678F,Barka-hotelowa-na-Wisle.jpg" there are zero digits before the leading /, so it matches there (and grok will match a nil string there), then there are zero digits before the i, so it matches there, zero before the m, so it matches there, and zero before the next / so it matches there again. Then it matches 56, and so on. If you change it to (?<request>([0-9]+)) then it will match 56. You might be able to use "(?<r>[0-9]{3,})", which means at least 3 digits.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.