Overview of current system
Logstash 2.x reading from Apache log files.
These log files come from a large number fo web services that we run.
The intention of this process to record the various requests made to services and to record the usage patterns across the service applications.
Main pattern uses grok process the Apache logo files, some come via Syslog
grok {
patterns_dir => "${EDW_root}/patterns"
break_on_match => true
match => { "message" => "%{ENDOFFILE:eof_marker}"}
match => { "message" => "%{SYSLOGTIMESTAMP} %{SYSLOGHOST} %{PROG}: %{IPORHOST}:%{POSINT} %{COMBINEDAPACHELOG}"}
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
Most of the work is done processing the HTTP requests e.g.
if "/news/" in [HTTP][request] {
mutate {
add_field => { "[service][action][genus]" => "STATIC PAGE" }
add_field => { "[service][action][species]" => "NEWS" }
}
}
Some patterns are simple , some are complex combinations of grok and match.
In total there are hundreds of patterns to look for across all the services.
The current system runs as a set of batch processes that process the previous days logs.
The output from Logstash is sent to ElasticSearch indices.
The indices are deleted after an upstream ETL process has extracted the previous days data.
We want to migrate the current system in the following way:
- Replace the reading of Apache web log files by Logstash with Filebeat.
- Replace the batch processing with continual processing of Filebeat events with Logstash running in Docker, one for each service.
- ElasticSearch output cannot change as there is a large upstream ETL system that extracts data from it on a daily basis.
Questions that I have:
- If I use the Filebeat Apache module I assume that I will not need to do the initial filtering with the grok patterns (the Syslog element is being removed)
- What will be the contents of the event that is sent to Logstash? Is it similar to the structure produced from the COMBINEDAPACHELOG grok pattern.