Migration from Logstash 2.x to Filebeat, Logstash and ElasticSearch 6.x

Overview of current system

Logstash 2.x reading from Apache log files.

These log files come from a large number fo web services that we run.

The intention of this process to record the various requests made to services and to record the usage patterns across the service applications.

Main pattern uses grok process the Apache logo files, some come via Syslog

grok {
patterns_dir => "${EDW_root}/patterns"
break_on_match => true
match => { "message" => "%{ENDOFFILE:eof_marker}"}
match => { "message" => "%{SYSLOGTIMESTAMP} %{SYSLOGHOST} %{PROG}: %{IPORHOST}:%{POSINT} %{COMBINEDAPACHELOG}"}
match => { "message" => "%{COMBINEDAPACHELOG}"}
}

Most of the work is done processing the HTTP requests e.g.

if "/news/" in [HTTP][request] {
mutate {
add_field => { "[service][action][genus]" => "STATIC PAGE" }
add_field => { "[service][action][species]" => "NEWS" }
}
}

Some patterns are simple , some are complex combinations of grok and match.

In total there are hundreds of patterns to look for across all the services.

The current system runs as a set of batch processes that process the previous days logs.

The output from Logstash is sent to ElasticSearch indices.

The indices are deleted after an upstream ETL process has extracted the previous days data.

We want to migrate the current system in the following way:

  1. Replace the reading of Apache web log files by Logstash with Filebeat.
  2. Replace the batch processing with continual processing of Filebeat events with Logstash running in Docker, one for each service.
  3. ElasticSearch output cannot change as there is a large upstream ETL system that extracts data from it on a daily basis.

Questions that I have:

  1. If I use the Filebeat Apache module I assume that I will not need to do the initial filtering with the grok patterns (the Syslog element is being removed)
  2. What will be the contents of the event that is sent to Logstash? Is it similar to the structure produced from the COMBINEDAPACHELOG grok pattern.

Filebeat modules on their own cannot do any preprocessing. It utilizes the Ingest functionality of Elasticsearch. So in order to make use of Apache Filebeat module, Filebeat needs to send to Elasticsearch output.

These are example events sent to Elasticsearch: https://github.com/elastic/beats/blob/master/filebeat/module/apache2/access/test/test.log-expected.json

So either you extend Ingest pipeline of Apache in Filebeat with your pattern. Or you could read files using Filebeat using log input and forward it to Logstash. In this case Logstash still needs to do all the processing.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.