Apache grok filter not working the way I expect it to. Should I be using a different approach?

I'm trying to set up filebeat/ELK such that I can drop an apache log file into a directory and it will be indexed automatically.

I'm currently succeeding in having the files indexed, however, each log file is just being indexed at one long string. For example, this is copied from my kibana dashboard:

|Time |message |source |

|March 19th 2018, 12:53:35.172| 66.249.79.132 - - [26/Feb/2018:10:34:58 +0000] "GET /places/businesses/amc/ HTTP/1.0" 404 60 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"| /home/me/apache.log

I'm expecting these logs to be indexed such that each component is an individual field instead of just one long string.

Maybe I'm misunderstanding how to use the grok filter, or there are additional settings I need to include. For example, perhaps I did not properly install the geoip plugin??? I've tried using "%{COMMONAPACHELOG}" and "%{COMBINEDPACHELOG}" but I still get the same results, the log files are indexed as just one string.

Here are my logstash conf files:

02-beats-input.conf

 input {
   beats {
     port => 5044
   }
 }

10-syslog-filter.conf

 filter {
     if [type] in [ "apache" , "apache_access" , "apache-access", "syslog", "log" ]  {
         grok {
             match => { "message" => "%{COMMONAPACHELOG}" } 
         }
         mutate {
             convert => ["response", "integer"]
                 convert => ["bytes", "integer"]
                 convert => ["responsetime", "float"]
         }     
         geoip {
             source => "clientip"
                 target => "geoip"
                 add_tag => [ "apache-geoip" ]
         }
         date {
             match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
                 remove_field => [ "timestamp" ]
         }     
         useragent {
             source => "agent"
         }
     }
     if [type] in ["apache_error","apache-error"] {
         grok {
             match => ["message", "\[%{WORD:dayname} %{WORD:month} %{DATA:day} %{DATA:hour}:%     {DATA:minute}:%{DATA:second} %{YEAR:year}\] \[%{NOTSPACE:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}"]
                 overwrite => [ "message" ]
         }   
         mutate
         {   
             add_field =>
             {   
                 "time_stamp" => "%{day}/%{month}/%{year}:%{hour}:%{minute}:%{second}"
             }   
         }
         date {
             match => ["time_stamp", "dd/MMM/YYYY:HH:mm:ss"]
                 remove_field => [ "time_stamp","day","dayname","month","hour","minute","second","year"]
         }
     }   
 }

30-elasticsearch-output.conf

 output {
   elasticsearch {
     hosts => ["localhost:9200"]
     sniffing => true
     manage_template => false
     index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
     document_type => "%{[@metadata][type]}"
   }
 }

Please show a complete example event. Copy/paste from Kibana's JSON tab.

Is this what you're asking for?

 {
   "_index": "filebeat-2018.03.19",
   "_type": "doc",
   "_id": "AWI_hI6QvVPbTbcX3xuE",
   "_version": 1,
   "_score": null,
   "_source": {
     "@timestamp": "2018-03-19T18:27:49.176Z",
     "beat": {
       "hostname": "mycomp",
       "name": "mycomp",
       "version": "5.6.8"
     },
     "input_type": "log",
     "message": "174.141.131.19 - - [22/Feb/2018:00:14:02 +0000] \"POST /wp-admin/admin-ajax.php      HTTP/1.0\" 200 123 \"https://SOMEDOMAIN.com/wp-admin/post.php?post=11080&action=edit\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167      Safari/537.36\"",
     "offset": 5825514,
     "source": "/home/me/ftflogs/apache9.log",
     "type": "log"
   },
   "fields": {
     "@timestamp": [
       1521484069176
     ]
   },
   "sort": [
     1521484069176
   ]
 }

Things look correct from here. Comment out the conditionals to see if those are what's stopping the filtering.

(Unrelated to your problem, but what you have is a combined log, not a common log.)

In case anyone runs into this issue themselves, I figured out the problem.

Within my filebeat.yml file, I was sending data directly to elasticsearch and NOT logstash. I commented out the elasticsearch output options, and uncommented logstash, set the path and now it's working correctly.

Thanks for helping me work through the problem!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.