I have input coming from eight Apache webservers. Included in these logs are a fair bit of inter-cluster traffic. I'd like to not include this in my results, as they aren't users, just the servers running a health check.
I know which lines I don't want, can discard either on the GET request or the IP.
I see how grok will let me remove tags, and fields, but how do I discard entire lines of input?
Reference: apache.conf file
filter {
if [type] == "apache" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float"]
}
}
}
Sample log line I want (sanitized)
108.121.141.101 - - [08/Jun/2015:07:35:12 -0700] "GET /foo/bar/galleries/bat.jpg HTTP/1.1" 304 - "http://www.client.com/members/biz" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) CriOS/43.0.2357.51 Mobile/12F70 Safari/600.1.4" usrid="-" "-"
Sample log line I don't want
108.121.141.101 - - [08/Jun/2015:07:35:12 -0700] "GET /foo/bar/galleries/bat.jpg HTTP/1.1" 304 - "http://www.client.com/members/biz" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) CriOS/43.0.2357.51 Mobile/12F70 Safari/600.1.4" usrid="-" "-"