Problem with logstash filter to parser logs from apache reverse proxy (Solved)

Hi for all

I have an Apache server with mod_proxy enabled that makes the reverse proxy function.
For each domain thare are 3 log files (in /var/log/apache2/) that records the access, errors and transfers. See the example:
domaina.com_access.log
domaina.com_error.log
domaina_transfer.log
domainb.com_access.log
domainb.com_error.log
domainb_transfer.log

The definition of virtualhosts in apache is as follows:
<VirtualHost *:80>
ProxyPreserveHost On
ServerAdmin admin@domaina.com
ServerName www.domaina.com
ServerAlias www.domaina.com
ProxyPass / http://10.10.10.102/
ProxyPassReverse / http://10.10.10.102/
CustomLog "/var/log/apache2/domaina.com_access.log" common
ErrorLog "/var/log/apache2/domaina.com_error.log"
TransferLog "/var/log/apache2/domaina.com_transfer.log"
Include /usr/local/modsecurity/polices/domaina.conf

<VirtualHost *:80>
ProxyPreserveHost On
ServerAdmin admin@domainb.com
ServerName www.domainb.com
ServerAlias www.domainb.com
ProxyPass / http://10.10.10.102/
ProxyPassReverse / http://10.10.10.102/
CustomLog "/var/log/apache2/domainb.com_access.log" common
ErrorLog "/var/log/apache2/domainb.com_error.log"
TransferLog "/var/log/apache2/domainb.com_transfer.log"
Include /usr/local/modsecurity/polices/domainb.conf

I use filebeat as prospector to send all logs to elastic stack. The filebeat conf is set as follow:
paths:
- /var/log/apache2/*_access.log
input_type: log
document_type: apache-access

  paths:
    - /var/log/apache2/*_error.log
  input_type: log
  document_type: apache-error

  paths:
    - /var/log/apache2/*_transfer.log
  input_type: log
  document_type: apache-transfer

  paths:
    - /var/log/auth.log
    - /var/log/syslog
  input_type: log
  document_type: syslog

To parser event logs sent by filebeat, I created the following filter in logstash for access and transfer operations:
if [type] in [ "apache-access" , "apache-transfer" ] {
grok {
# match => [
# "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
# "message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
# ]
# match => { "message" => "%{COMBINEDAPACHELOG}" }
match => [ "message", "%{IPORHOST:virtualhost}:%{INT:port:int} %{COMBINEDAPACHELOG}" ]
}

     else if [type] == "apache-error" {
        grok {
                match => {"message" => "\[%{WORD:dayname} %{WORD:month} %{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second} %{YEAR:year}\] \[%{NOTSPACE:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}"}
            }

All matches used did not work and I can not handle the logs raceived by logstash.

Only logstash filter to syslog works. This filter has been defined in a separate file.

Any suggestion?

Thanks.

Please show us an example of an event that doesn't get correctly parsed. Output from a stdout { codec => rubydebug } output is preferred.

Hi Magnus. Thank you very much for your reply.

I changed logstash output filter:

output {
elasticsearch {
hosts => ["localhost:9200"]
sniffing => true
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}

stdout { codec => rubydebug }

}

I restarted logstash service with logstash start -f -vv --debug command

Output of logstash.log

[root@elastic conf.d]# tail -f /var/log/logstash/logstash.log
{:timestamp=>"2016-10-31T08:40:13.445000-0300", :message=>"Pipeline main has been shutdown"}
{:timestamp=>"2016-10-31T08:43:23.110000-0300", :message=>"Pipeline main started"}
{:timestamp=>"2016-10-31T08:46:02.063000-0300", :message=>"SIGTERM received. Shutting down the agent.", :level=>:warn}
{:timestamp=>"2016-10-31T08:46:02.066000-0300", :message=>"stopping pipeline", :id=>"main"}
{:timestamp=>"2016-10-31T08:46:03.069000-0300", :message=>"Pipeline main has been shutdown"}
{:timestamp=>"2016-10-31T08:46:29.633000-0300", :message=>"Pipeline main started"}
{:timestamp=>"2016-10-31T08:54:20.960000-0300", :message=>"SIGTERM received. Shutting down the agent.", :level=>:warn}
{:timestamp=>"2016-10-31T08:54:20.975000-0300", :message=>"stopping pipeline", :id=>"main"}
{:timestamp=>"2016-10-31T08:54:22.033000-0300", :message=>"Pipeline main has been shutdown"}
{:timestamp=>"2016-10-31T08:55:33.978000-0300", :message=>"Pipeline main started"}

The connection to the cliet was established:
[root@elastic conf.d]# lsof -nPi :5044
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 27004 logstash 14u IPv4 56286937 0t0 TCP *:5044 (LISTEN)
java 27004 logstash 45u IPv4 56288563 0t0 TCP 10.10.10.15:5044->10.20.20.254:36443 (ESTABLISHED)

The port TCP/9200 is listening:
[root@elastic conf.d]# lsof -nPi :9200
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 16684 elasticsearch 91u IPv4 25142346 0t0 TCP 127.0.0.1:9200 (LISTEN)
java 16684 elasticsearch 4324u IPv4 56286962 0t0 TCP 127.0.0.1:9200->127.0.0.1:33458 (ESTABLISHED)
java 27004 logstash 47u IPv4 56286961 0t0 TCP 127.0.0.1:33458->127.0.0.1:9200 (ESTABLISHED)

The output of stdout { codec => rubydebug } ends up in /var/log/logstash/logstash.out (or is it logstash.stdout?) so that's the file I'm interested in.

sorry for my mistake.

Follows the output of /var/log/logstash/logstash.stdout

[root@elastic ~]# cat /var/log/logstash/logstash.stdout
Sending logstash logs to /var/log/logstash/logstash.log.
{
"syslog_timestamp" => "Oct 31 09:17:01",
"syslog_hostname" => "reverso-apache",
"syslog_program" => "CRON",
"syslog_pid" => "24480",
"syslog_message" => "pam_unix(cron:session): session opened for user root by (uid=0)",
"received_at" => "2016-10-31T12:17:10.820Z",
"received_from" => "reverso-apache",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}
{
"message" => "Oct 31 09:17:01 reverso-apache CRON[24480]: pam_unix(cron:session): session closed for user root",
"@version" => "1",
"@timestamp" => "2016-10-31T12:17:01.000Z",
"input_type" => "log",
"count" => 1,
"fields" => nil,
"beat" => {
"hostname" => "reverso-apache",
"name" => "reverso-apache"
},
"source" => "/var/log/auth.log",
"offset" => 72838,
"type" => "syslog",
"host" => "reverso-apache",
"tags" => [
[0] "beats_input_codec_plain_applied"
],
"syslog_timestamp" => "Oct 31 09:17:01",
"syslog_hostname" => "reverso-apache",
"syslog_program" => "CRON",
"syslog_pid" => "24480",
"syslog_message" => "pam_unix(cron:session): session closed for user root",
"received_at" => "2016-10-31T12:17:10.820Z",
"received_from" => "reverso-apache",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}

Okay, but those are syslog entries and that's not what you're asking for help with. What does an Apache log event look like?

Hi Magnus.

I reviewed the filebeat.yml file and I found errors in section path.
Misssing dashes between the path settings. See the example with error:

  paths:
    - /var/log/apache2/*_access.log
  input_type: log
  document_type: apache-access

  paths:
    - /var/log/apache2/*_error.log
  input_type: log
  document_type: apache-error

  paths:
    - /var/log/apache2/*_transfer.log
  input_type: log
  document_type: apache-transfer

  paths:
    - /var/log/auth.log
    - /var/log/syslog
  input_type: log
  document_type: syslog

See the example without error:

  **-**
  paths:
    - /var/log/apache2/*_access.log
  input_type: log
  document_type: apache-access

 **-** 
  paths:
    - /var/log/apache2/*_error.log
  input_type: log
  document_type: apache-error
  **-**
  paths:
    - /var/log/apache2/*_transfer.log
  input_type: log
  document_type: apache-transfer
  **-**
  paths:
    - /var/log/auth.log
    - /var/log/syslog
  input_type: log
  document_type: syslog

No dashes, the filebeat always considered the last line of the path section. For this reason always document_type was the syslog and never had the classification as apache.

Thank you very much for your help.