Parse HTTP requests to the NASA Kennedy Space Center WWW server in Florida

Hi,

I'm working on Nasa Kennedy Space Center web server data set and I want to monitore them in Kibana.
So what I want is to collect them with Filebeat and parse them with Logstash.
So is somebody know how to configure the grok filter in order to retrieve all meaningfull information, then I could create visualisation in kibana dashboard.
here is the log format

// 3066-7.usi.edu - - [04/Aug/1995:10:13:34 -0400] "GET /images/launchmedium.gif HTTP/1.0" 200 11853
// tiber.gsfc.nasa.gov - - [04/Aug/1995:10:13:36 -0400] "GET /elv/DELTA/delta.gif HTTP/1.0" 200 2244
// unifex.ksc.nasa.gov - - [04/Aug/1995:10:13:38 -0400] "GET / HTTP/1.0" 304 0
unifex.ksc.nasa.gov - - [04/Aug/1995:10:13:38 -0400] "GET /images/MOSAIC-logosmall.gif HTTP/1.0" 304 0
// unifex.ksc.nasa.gov - - [04/Aug/1995:10:13:38 -0400] "GET /images/ksclogo-medium.gif HTTP/1.0" 304 0

Need help please!!

Hi @jailbreakerSN,

If you download and extract (or install from package) Logstash you should find a file like
vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns/httpd in the Logstash folder. There are some ready made patterns that might work or would only need a bit of editing to work for you

# cat  vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns/httpd
HTTPDUSER %{EMAILADDRESS}|%{USER}
HTTPDERROR_DATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}

# Log formats
HTTPD_COMMONLOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} %{QS:referrer} %{QS:agent}

# Error logs
HTTPD20_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{LOGLEVEL:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}
HTTPD24_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{WORD:module}:%{LOGLEVEL:loglevel}\] \[pid %{POSINT:pid}(:tid %{NUMBER:tid})?\]( \(%{POSINT:proxy_errorcode}\)%{DATA:proxy_message}:)?( \[client %{IPORHOST:clientip}:%{POSINT:clientport}\])?( %{DATA:errorcode}:)? %{GREEDYDATA:message}
HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG}

# Deprecated
COMMONAPACHELOG %{HTTPD_COMMONLOG}
COMBINEDAPACHELOG %{HTTPD_COMBINEDLOG}

There are also many other ready made patterns like

HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)

Combining these should give you what you want.

This is not very pretty but is almost complete

// %{HOSTNAME:hostname} - - \[%{MONTHDAY:day}\/%{MONTH:month}\/%{YEAR:year}:%{TIME:time} %{DATA:timezone}\] \"%{DATA:http_method} %{URIPATH:uri_path} %{DATA:protocol}\/%{NUMBER:version}\" %{NUMBER:responce_code}

The end of the line and the optional bits need to be handled.

hi @A_B,

Thank for the reply, it helps me a lot.
I try it and it seems to be well parsed with the %{COMMONAPACHELOG} patterns.

Thank again for the tips!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.