Good point... as it wasn't initially apparent on how to do this.
Firstly I have changed the webserver to list dummy financial products (as this is what I am stating I am doing in my thesis). So if the URLs look different from my original post, that's why.
So I have a webserver that logs URLs that I want to parse in Logstash, so I can group them in on a per user basis in Kibana. I also want to get out of a lot of the crap out of the log file (see the last line)
An example log would be:
<IP removed> - - [07/Apr/2016:12:41:38 +0000] "GET /product-category/banking/transaction-account/ HTTP/1.1" 200 6777 http://<IP removed>/product-category/banking/transaction-account/ wp-settings-1=libraryContent%3Dbrowse; wp-settings-time-1=1460008031; wordpress_test_cookie=WP+Cookie+check; wordpress_logged_in_cf8749745210721969771dfabf2df0ea=nathansturgess.pt%7C1460181231%7Czd5rFlzqVwPlbmfKEzJvzVhKeRoNLVfOJPbQNiHAZTs%7858084f4596e3a6863df44a88fbb59Cdc2d2145c62c3c6083519c1cfb8cf47ff2
I want to match on the apache log data, the username in the cookie and on urls that could be in the following format:
/product-category/<category>/ an example would be /product-category/banking/
/product-category/<category>/<subcategory>/ an example would be /product-category/banking/transction-account/
/product/<product>/ example would be /product/everyday-account/
To do this matching I have the following filter snippet from my logstash.conf file:
filter {
grok {
match => { "message" => "\A%{COMMONAPACHELOG}(.*=(?<username>[^%]*)%)" }
}
grok {
match => [ "request", "/product-category\/(?<category>.*?)\/" ]
}
grok {
match => [ "request", "/product-category\/.*?\/(?<subcategory>.*?)\/" ]
}
grok {
match => [ "request", "/product\/(?<product>.*?)\/" ]
}
mutate {
remove_field => ["ident", "auth", "source", "hostname", "name", "host", "beat", "input_type"]
}
}
Here is a screenshot of a couple of lines of Kibana
I then plan to enrich the logs with only the product with the category and subcategory and from lines with the subcategory to enrich the logs with the category. I need to figure out how to do this.