Parse HTTP requests to the NASA Kennedy Space Center WWW server in Florida


I'm working on Nasa Kennedy Space Center web server data set and I want to monitore them in Kibana.
So what I want is to collect them with Filebeat and parse them with Logstash.
So is somebody know how to configure the grok filter in order to retrieve all meaningfull information, then I could create visualisation in kibana dashboard.
here is the log format

// - - [04/Aug/1995:10:13:34 -0400] "GET /images/launchmedium.gif HTTP/1.0" 200 11853
// - - [04/Aug/1995:10:13:36 -0400] "GET /elv/DELTA/delta.gif HTTP/1.0" 200 2244
// - - [04/Aug/1995:10:13:38 -0400] "GET / HTTP/1.0" 304 0 - - [04/Aug/1995:10:13:38 -0400] "GET /images/MOSAIC-logosmall.gif HTTP/1.0" 304 0
// - - [04/Aug/1995:10:13:38 -0400] "GET /images/ksclogo-medium.gif HTTP/1.0" 304 0

Need help please!!

Hi @jailbreakerSN,

If you download and extract (or install from package) Logstash you should find a file like
vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns/httpd in the Logstash folder. There are some ready made patterns that might work or would only need a bit of editing to work for you

# cat  vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns/httpd

# Log formats
HTTPD_COMMONLOG %{IPORHOST:clientip} %{HTTPDUSER:ident} %{HTTPDUSER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

# Error logs
HTTPD20_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{LOGLEVEL:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}
HTTPD24_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{WORD:module}:%{LOGLEVEL:loglevel}\] \[pid %{POSINT:pid}(:tid %{NUMBER:tid})?\]( \(%{POSINT:proxy_errorcode}\)%{DATA:proxy_message}:)?( \[client %{IPORHOST:clientip}:%{POSINT:clientport}\])?( %{DATA:errorcode}:)? %{GREEDYDATA:message}

# Deprecated

There are also many other ready made patterns like

HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)

Combining these should give you what you want.

This is not very pretty but is almost complete

// %{HOSTNAME:hostname} - - \[%{MONTHDAY:day}\/%{MONTH:month}\/%{YEAR:year}:%{TIME:time} %{DATA:timezone}\] \"%{DATA:http_method} %{URIPATH:uri_path} %{DATA:protocol}\/%{NUMBER:version}\" %{NUMBER:responce_code}

The end of the line and the optional bits need to be handled.

hi @A_B,

Thank for the reply, it helps me a lot.
I try it and it seems to be well parsed with the %{COMMONAPACHELOG} patterns.

Thank again for the tips!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.