Apache2 module parsing


(Raman Gupta) #1

I have a Kubernetes pod annotated with hint co.elastic.logs/module: apache2. I have a source log that looks like this:

::ffff:10.5.1.62 - - [05/Nov/2018:15:39:18 +0000] "GET /config HTTP/1.1" 304 - "https://myserver.com/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36"

and the output from Filebeat 6.4.2 in Kibana is showing in field error.message:

Provided Grok expressions do not match field value: [::ffff:10.5.1.62 - - [05/Nov/2018:15:39:18 +0000] \"GET /config HTTP/1.1\" 304 - \"https://myserver.com/\" \"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.67 Safari/537.36\"]

Oddly, if I use the grok expression I found in the Filebeat source here https://github.com/elastic/beats/blob/v6.4.2/filebeat/module/apache2/access/ingest/default.json, I don't see any particular reason why this expression does not parse.

It does in fact parse just fine using the Grok debugger in Kibana, and the expression from the json above (with the value unescaped):

Any ideas where to look next?

BTW, Kibana is showing stdout for the stream.


(Noémi Ványi) #2

I have tried to reproduce your problem, but I failed. Are you sure that the pipeline on the Ingest node is the same as the one in Filebeat source?


(Raman Gupta) #4

@kvch I've checked using the _ingest/pipeline endpoint and it looks like it is:

    "filebeat-6.4.2-apache2-access-default": {
        "description": "Pipeline for parsing Apache2 access logs. Requires the geoip and user_agent plugins.",
        "on_failure": [
            {
                "set": {
                    "field": "error.message",
                    "value": "{{ _ingest.on_failure_message }}"
                }
            }
        ],
        "processors": [
            {
                "grok": {
                    "field": "message",
                    "ignore_missing": true,
                    "patterns": [
                        "%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?",                                                     
                        "%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -"                                             
                    ]                                                                                                                                                                                                              
                }                                                                                                                                                                                                                  
            },                                                                                                                                                                                                                     
            {                                                                                                                                                                                                                      
                "remove": {                                                                                                                                                                                                        
                    "field": "message"                                                                                                                                                                                             
                }                                                                                                                                                                                                                  
            },                                                                                                                                                                                                                     
            {                                                                                                                                                                                                                      
                "rename": {                                                                                                                                                                                                        
                    "field": "@timestamp",                                                                                                                                                                                         
                    "target_field": "read_timestamp"                                                                                                                                                                               
                }                                                                                                                                                                                                                  
            },
            {
                "date": {
                    "field": "apache2.access.time",
                    "formats": [
                        "dd/MMM/YYYY:H:m:s Z"
                    ],
                    "target_field": "@timestamp"
                }
            },
            {
                "remove": {
                    "field": "apache2.access.time"
                }
            },
            {
                "user_agent": {
                    "field": "apache2.access.agent",
                    "ignore_failure": true,
                    "target_field": "apache2.access.user_agent"
                }
            },
            {
                "remove": {
                    "field": "apache2.access.agent",
                    "ignore_failure": true
                }
            },
            {
                "geoip": {
                    "field": "apache2.access.remote_ip",
                    "target_field": "apache2.access.geoip"
                }
            }
        ]
    },

Is it possible the filebeat-6.4.2-apache2-error-pipeline is being used here for some reason instead?


(Raman Gupta) #5

Ok, if I add another annotation to the Kubernetes container:

co.elastic.logs/fileset.stdout: access

then all the log lines are parsed correctly. It is very odd to me that a) some log lines would parse correctly without this annotation and some would not, and that b) this wouldn't be the default.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.