Help with grokking when the logs aren't of a definite format

Hi,

I have stream of logs coming in and I ship them using filebeat to logstash. One such event looks like this

1545371943.464899 IP 92.42.189.139.80 > 185.234.217.231.55618: Flags [P.], seq 1:513, ack 172, win 
260, options [nop,nop,TS val 39623059 ecr 1955048678], length 512: HTTP: HTTP/1.1 301 Moved 
Permanently
E..4~o@.v...\*.......P.B..ww.-2.....&$.....
.\..t...HTTP/1.1 301 Moved Permanently
Content-Length: 250
Content-Type: text/html
Location: https://www.comstern.at/?q=0619659078515
Server: Microsoft-IIS/8.5
X-StackifyID: V2|903dbe21-8831-40eb-a36f-35230ee30abd|C57918|CD2
Date: Fri, 21 Dec 2018 05:59:02 GMT

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1><p>The document has moved <a href="https://www.comstern.at/? 
q=0619659078515">here</a>.</p>
</body></html>

I need to extract all possible IPs, URLs, etc. Is grok the way to go given it can't be of any definite format. If yes, then how to go about writing grok pattern and then the filter?

Any help is appreciated. Thanks

I'm blanking on how to get the first line parsed with grok, so I would do it with dissect instead. This should work...

    grok { match => { "message" => "Location: %{URI:url}
" } }
    dissect { mapping => { "message" => "%{}.%{} IP %{src}.%{+src}.%{+src}.%{+src}.%{} > %{dst}.%{+dst}.%{+dst}.%{+dst}.%{}" } }

Hey @Badger, thanks for the prompt response. Is there a way to also extract the "href" which is part of the HTML?

Also "Location" might not always hold the URL for example in the below attached event.

1545371990.044486 IP 185.234.217.231.35890 > 72.47.224.64.80: Flags [P.], seq 100908146:100908474, 
ack 1137501506, win 229, options [nop,nop,TS val 1585772740 ecr 141779380], length 328: HTTP: GET 
/index.php HTTP/1.0
E..|].@.@. $....H/.@.2.P...rC..B...........
^....sa.GET /index.php HTTP/1.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
Chrome/68.0.3440.84 Safari/537.36
Referer: http://crox.anatz.relayblog.com
Host: forum.youami.com.au
Connection: close

where URLs are "Referer" and "Host". Is there any way to handle these scenarios as well?

Thanks

grok { match => { "message" => "^Referer: %{DATA:referer}
" } }

That technique can be used to pull out any header. DATA picks up everything up to the next newline.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.