Single grok expressions for different format of logs?

Hey all, I am new to logstash and I am setting up logstash on a server which gets multiple formats of logs in the same type. I was wondering if there is a way i could write just one grok expressions for these logs by adding some exceptions or optional values, or do i need a separate grok expression for each log format as i wrote below.

here is the first log format:

   1.2.3.4 5.6.7.8 - - [06/Jun/2019:13:38:24 +0000] "GET /homepage/v1?HTTP/1.0" 200 25853 "test.com" "https://test.com/homepage" "useragent" "-" - 0.05 0.24

and the grok expression i use for this is:

%{IP:internal-ip} %{IP:clientip} - - \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /%{DATA:page}/%{DATA:version}\? HTTP/%{NUMBER:httpversion})\"  %{NUMBER:httpresponse} %{NUMBER:responsebytes} %{QS:hostname} %{QS:referrer} %{QS:useragent} %{QS:affaid} - %{NUMBER:requesttime} %{NUMBER:responsetime}

and the second log format is:

   11.12.13.14 15.16.17.18 - - [06/Jun/2019:13:38:24 +0000] "GET /customersupport/form/v1?submission=yes&repeat=no&id=123HTTP/1.0" 200 25853 "test.com" "https://test.com/homepage" "useragent" "-" - 0.05 0.24

and the grok expression i wrote for the second format is:

%{IP:internal-ip} %{IP:clientip} - - \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /%{DATA:page}/%{DATA:value}/%{DATA:version}\?%{DATA:submission}&%{DATA:repeat}&%{DATA:id} HTTP/%{NUMBER:httpversion})\"  %{NUMBER:httpresponse} %{NUMBER:responsebytes} %{QS:hostname} %{QS:referrer} %{QS:useragent} %{QS:affaid} - %{NUMBER:requesttime} %{NUMBER:responsetime}

they work well individually but if i try to mix both the format with optional grok values then they either mismatch or get an error.
if i do need to include both the grok expression in logstash can you guys advice me on how to differentiate on how logstash selects each grok expression for each of the logs format. Thanks

You can give a grok filter a list of patterns to match against. Given that you need your pattern to match the entire line you should definitely anchor it using ^ -- otherwise grok has to retry the pattern match starting at every position in the field you are matching against.

If that is not clear try to understand that

grok { match => { "message" => "HTTP/%{NUMBER:httpVersion}" } }

will match both your messages. grok does not care what comes before or after the fragment of your field that matches the pattern.

I had to adjust the spaces in some of your patterns to match the messages, but that might have been an issue with cut and paste and my editor automatically wrapping.

    grok {
        match => {
            "message" => [
"^%{IP:internal-ip} %{IP:clientip} - - \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /%{DATA:page}/%{DATA:value}/%{DATA:version}\?%{DATA:submission}&%{DATA:repeat}&%{DATA:id}HTTP/%{NUMBER:httpversion})\" %{NUMBER:httpresponse} %{NUMBER:responsebytes} %{QS:hostname} %{QS:referrer} %{QS:useragent} %{QS:affaid} - %{NUMBER:requesttime} %{NUMBER:responsetime}",
"^%{IP:internal-ip} %{IP:clientip} - - \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:method} /%{DATA:page}/%{DATA:version}\?HTTP/%{NUMBER:httpversion})\" %{NUMBER:httpresponse} %{NUMBER:responsebytes} %{QS:hostname} %{QS:referrer} %{QS:useragent} %{QS:affaid} - %{NUMBER:requesttime} %{NUMBER:responsetime}"
             ]
         }
     }

I actually would not start with grok for this. Use dissect to chop up the common elements, then attack the results with (anchored) groks.

dissect { mapping => { "message" => '%{internal-ip} %{clientip} - - [%{timestamp}] "%{requestString}" %{httpresponse} %{responsebytes} "%{hostname}" "%{referrer}" "%{useragent}" "%{somethingThatIsSometimesAffaid}" - %{requesttime} %{responsetime}' } }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.