Grouping several words in one pattern


#1

Hello all,

I just discovered gork today and I'm wondering how one can group a sequence of words into one syntax.

Here's an example.

The input :
> 83.149.9.216 - - [04/Jan/2015:05:13:42 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-search.png HTTP/1.1" 200 203023 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36

My pattern for now :

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:method} %{NOTSPACE:request} (?:HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "%{NOTSPACE:url}" "%{NOTSPACE:useragent} (%{WORD:machine}; %{WORD:processor} %{WORD:system} %{WORD:system} %{WORD:system} %{WORD:systemversion}) %{NOTSPACE:kitversion}

It's the "{WORD:system} %{WORD:system} %{WORD:system}" that's troubling me : it doesn't look very efficient AND it actually generates this "Mac, OS, X" in "system" while I'd like this "Mac OS X". Basically grouping a selected set of word in one expression.
I don't have much Regex knowledge and I couldn't solve this using the similar questions I've found online.

Any help ?

Thank you ! :slight_smile:


(Magnus Bäck) #2

Any particular reason you're not using the patterns for this kind of file that's shipped with Logstash? There's an example of HTTP log parsing in the Logstash intro documentation.


#3

Oh I was just toying around the tool to make sure I understand what I do. I prefer to try several patterns on my own from scratch before using the existing patterns. :slight_smile:


(Magnus Bäck) #4

Okay. So in this particular case you'll want to use one of the existing patterns and feed the resulting useragent field to the useragent filter, but to answer your question you could e.g. do this so capture three words separated by at least one whitespace character into the words field:

(?<words>\b\w+\s+\w+\s+\w+\b)

This would be the same but is slightly shorter and more extensible:

(?<words>\b\w+(\s+\w+){2}\b)

#5

Alright, it works indeed.
Thank you very much !


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.