Filebeat 6 + Apache2 module : fields not exported to Logstash

Hi Everyone,

I have recently a full Beats 6 + ELK 6 stack : Filebeat + Metricbeat agents on 2 servers, ELK on another one.

I've configured both Metricbeat and Filebeat the same way to send data to Logstash over TLS, and disabled Elasticsearch output. For both of them, I loaded ES templates and Kibana dashboards.

Metricbeat data is correctly exported to Logstash: I can see all fields in MB logs and in Logstash with Ruby debug output.

On Filebeat side, with Apache2 module enabled, messages are properly transmitted, but fields are not exported: only plain message (log line) is received by Logstash. Therefore, I need to manually parse messages with grok.

Is that the good way to do? Did I miss something? Should I prefer ES output over Logstash for this use-case?

Thanks in advance !

My Logstash config:

input {
  beats {
    port => 5044
    ssl => true
    ssl_certificate => "/etc/ssl/certs/kibana.mydomain.com.crt"
    ssl_key => "/etc/ssl/private/kibana.mydomain.com.key"
    tags => [ "beats" ]
  }
}

output {
  if "beats" in [tags] {
    elasticsearch {
      hosts => ["http://localhost:9200"]
      index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
    }
  }
}

Hello,

The way modules works in filebeat is most of them are actually using an ingest pipeline on elasticsearch to do the extraction and transformation of the original data, so when you send you events to Logstash the ES output doesn't send them to the ingest pipeline and you only get your raw data.

There is a few solution for that, depending on your need and architecture.

  1. Use Filebeat directly that will send the data to the correct ingest pipeline.
  2. Push the ingest pipeline to elasticsearch, Use conditionals in the LS config to route the events to a elasticsearch output with the configured pipeline option
  3. Convert the ingest pipeline to a logstash pipeline manually with the help of a logstash tool (this is similar to what you did)

It really depends on your usecase, all of the above solution are correct.

Thanks

Thanks a lot Pier-Hugues for your quick and complete answer!

I chose the third option, which seems to work properly, but I had to manually add the useragent filter to the generated Logstash config.

Logstash and ingest pipeline differ sometime in features, we are slowly converging that will make the tool better but we are not there yet :slight_smile:

For anyone reading this topic, I had to wrap the resulting filter in a conditional to avoir errors when parsing something else than access logs.

Full working filter below:

filter {
   if [fileset][module] == "apache2" and [fileset][name] == "access" {
      grok {
         match => {
            "message" => [
               "%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \[%{HTTPDATE:[apache2][access][time]}\] \"%{WORD:[apache2][access][method]} %{DATA:[apache2][access][url]} HTTP/%{NUMBER:[apache2][access][http_version]}\" %{NUMBER:[apache2][access][response_code]} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:[apache2][access][referrer]}\")?( \"%{DATA:[apache2][access][agent]}\")?",
               "%{IPORHOST:[apache2][access][remote_ip]} - %{DATA:[apache2][access][user_name]} \[%{HTTPDATE:[apache2][access][time]}\] \"-\" %{NUMBER:[apache2][access][response_code]} -"
            ]
         }
      }

      mutate {
         rename => {
            "@timestamp" => "read_timestamp"
         }
      }
      date {
         match => [
            "[apache2][access][time]",
            "dd/MMM/YYYY:H:m:s Z"
         ]
         target => "@timestamp"
      }

      useragent {
         source => "[apache2][access][agent]"
         target => "[apache2][access][user_agent]"
      }

      geoip {
         source => "[apache2][access][remote_ip]"
         target => "[apache2][access][geoip]"
      }
   }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.