Logstash Filter Issue with Apache2

Hi All,

I've recently started using the Log handler Trio of Logstash + ELK + Kibana to work analyze through my apache logs during my daily work activities. However, I'm facing some issues with Grok when it comes to filtering the log messages and parsing them for ELK. Can someone please guide me in the right direction as to what is wrong ?

input {
    file {
      path  =>  "/var/log/apache2/error.log"
      type  =>  "apacheerror"
    }
    file {
      path  =>  ["/var/log/apache2/access_log.http", "/var/log/apache2/access_log.https"]
      type  =>  "apacheaccess"
    }
    file {
      path  =>  "/var/log/messages"
      type  =>  "syslog"
    }
    file {
      path  =>  ["/var/log/mysql.log", "/var/log/mysql.err"]
      type  =>  "MySQL"
    }
    file {
      path => "/var/log/apache2/*.ls_json"
      tags => "apache_json"
      codec => "json"
    }
  }

  filter {
    if [type] == "syslog" {
      grok {
        match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
        add_field => [ "received_at", "%{@timestamp}" ]
        add_field => [ "received_from", "%{host}" ]
      }
      syslog_pri { }
      date {
        match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
      }
    }
    if [type] == "apacheerror" {
     grok {
       match => { "message" => "^\s{0,}\[(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})\]\s+(\[%{WORD:loglevel}\]\s+)?%{GREEDYDATA:message}" }
       add_field => [ "apache_received_at", "%{@timestamp}" ]
       add_field => [ "apache_received_from", "%{host}" ]
     }
    }
  }


  output {
    elasticsearch { host => localhost }
  }

Above is my Logstash Configuration file.

And the below is a set of sample logs from my error.log file in Apache.

  [Thu Jun 09 06:02:02 2016] [error] Exception KeyError: KeyError(140605482088256,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:02:03 2016] [error] Exception KeyError: KeyError(140605482088256,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:02:03 2016] [notice] caught SIGTERM, shutting down
  [Thu Jun 09 06:02:04 2016] [notice] Apache/2.2.14 (Ubuntu) mod_ssl/2.2.14 OpenSSL/0.9.8k mod_wsgi/2.8 Python/2.6.5 configured -- resuming normal operations
  [Thu Jun 09 06:06:39 2016] [error] [client 000.000.000.000] This is it at /usr/local/DOCS/xyz/abcd.cgi line 56, 
  [Thu Jun 09 06:06:56 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:06:57 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:06:58 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:06:59 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:07:00 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:07:01 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:07:02 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:07:03 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:07:04 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored
  [Thu Jun 09 06:07:05 2016] [error] Exception KeyError: KeyError(139653229332288,) in <module 'threading' from '/usr/lib/python2.6/threading.pyc'> ignored

These apache log messages show as a complete string when I filter only for 'message' from my Kibana. However, the Syslog Component works fine.

I can't find anything obviously wrong with your grok expression. Start with the simplest possible expression, ^\s{0,}\[(?<timestamp>%{DAY} and make sure that works. Then add tokens until it breaks. Are your Apache error log events tagged with _grokparsefailure?

I strongly recommend that you use a stdout { codec => rubydebug } output while debugging.

Thanks for your quick response.

I think i might have figured out the reason behind the mess I was facing.

   match => { "message" => "^\s{0,}\[(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})\]\s+(\[%{WORD:loglevel}\]\s+)?%{GREEDYDATA:message}" }

The {GREEDYDATA:message} that was in my Pattern Match gets overpowered by the message causing the fetched RegEx pattern to be over-written by the actual line itself.

I changed {GREEDYDATA:message} to {GREEDYDATA:apache_message} and that seems to have resolved my problem. :slight_smile:

Just so if someone want's to re-use this, following is the RegEx that worked perfectly for me.

^\s{0,}\[(?<timestamp>%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR})\]\s+(\[%{WORD:loglevel}\]\s+)?(\[client.*\])?%{GREEDYDATA:apache_message}(, referer\:.+)

May the Force be with you.