I have stream of logs coming in and I ship them using filebeat to logstash. One such event looks like this
1545371943.464899 IP 92.42.189.139.80 > 185.234.217.231.55618: Flags [P.], seq 1:513, ack 172, win
260, options [nop,nop,TS val 39623059 ecr 1955048678], length 512: HTTP: HTTP/1.1 301 Moved
Permanently
E..4~o@.v...\*.......P.B..ww.-2.....&$.....
.\..t...HTTP/1.1 301 Moved Permanently
Content-Length: 250
Content-Type: text/html
Location: https://www.comstern.at/?q=0619659078515
Server: Microsoft-IIS/8.5
X-StackifyID: V2|903dbe21-8831-40eb-a36f-35230ee30abd|C57918|CD2
Date: Fri, 21 Dec 2018 05:59:02 GMT
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1><p>The document has moved <a href="https://www.comstern.at/?
q=0619659078515">here</a>.</p>
</body></html>
I need to extract all possible IPs, URLs, etc. Is grok the way to go given it can't be of any definite format. If yes, then how to go about writing grok pattern and then the filter?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.