Filebeat/Logstash Apache Access Logs timestamp issue

Hello everyone

I'm new to the ELK world and need some help with a basic Apache HTTP filebeat -> logstash -> Elasticsearch <- Kibana filter/pipeline for learning purposes.

Version:
8.3.3

Problem:

A dummy apache access output that I want visualize in Kibana incl. geoip.

57.85.164.98 - - [20/Sep/2017:15:31:04 +0200] "GET /favicon.ico HTTP/1.1" 200 7581 "https://codingexplained.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"

The '@timestamp' field does not match the timestamp field inside the apache access log. However, I can't filter the 'timestamp' (the Apache one) in Kibana nor does the default Dashboard work.

I assume I have to map the 'timestamp' somehow with the '@timestamp' ?

I read the manual x times and it won't work. Also I noticed that the example config here is outdated regarding ECS ?

Here is my pipeline/filter:

input {
    beats {
        port => 5044
        host => "0.0.0.0"
    }
}

filter {
    if [event][dataset] != "apache.access" {
        drop { }
    }

    grok {
        match => { "[event][original]" => '%{HTTPD_COMBINEDLOG}' }
    }

    mutate {
        remove_field => ["event.orginal", "log", "input", "service", "host", "ecs", "@version"]
    }

    grok {
        "match" => {
            "[source][address]" => "^(%{IP:[source][ip]}|%{HOSTNAME:[source][domain]})"
        }
    }

    date {
        match => [ "[apache][access][time]", "dd/MMM/yyyy:H:m:s Z" ]
    }

    useragent {
        source => "[user_agent][original]"
        target => "[user_agent]"
    }

    geoip {
        source => "[source][ip]"
        target => "[source][geo]"
    }
}

output {
    elasticsearch {
        hosts => "localhost"
        manage_template => false
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY-MM-dd}"
    }
}

Hi @SirStephanikus Welcome to the Community

Did you setup the Apache Module in Filebeat?

If so this should work automatically for you, the apache log will all be completely parsed with correct timestamp geo ip etc.. etc.. ... that is the point of the module.

Is that what you want?

Do you really want to use logstash (which is fine but it will take a bit more work)

What I would suggest is setup Filebeat directly to Elasticsearch using the quickstart here

Get that working ....

and then if you want to passthrough logstash we can show you how to do that... using this passthrough style logstash here

Hello Stephenb

Yes I did and I really want to ship the stream from filebeat to logstash (learning purposes with the goal to really use every field and understand the nuts and bolts).

Is there -perhaps- an error or mistake in my above configuration?

Yes most likely

Either you want to manually parse everything or you use the module... it seems that you want to manually parse everything in logstash for learning purposes.

If that is the case first thing you need to do is add a debug output

output {
    elasticsearch {
        hosts => "localhost"
        manage_template => false
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY-MM-dd}"
    }
  stdout { codec => rubydebug }
}

Start logstash with the -r field so it will reload the config.

That will automatically target the @timestamp

You will need to show us what your outputs looks like...

Why don't you start with this exact example get it working and go from there. Note it has the debug output.

You have to show us output otherwise we can not help debug

Hello stephenb,

thanks for your quick response and your help.

I did that tutorial multiple times, however based on your suggestion I did it again with a clean EL instance.

The @timestamp field is now eq to the timestamp.

What I actually don't understand is, why do I get 3x a _grokparsefailure with the 1:1 configuration like in the manual?

EL:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": ".ds-logs-generic-default-2022.08.31-000001",
        "_id": "3e0o9YIBuDGuteorJyfH",
        "_score": 1,
        "_source": {
          "@version": "1",
          "@timestamp": "2022-08-31T18:27:05.457042Z",
          "tags": [
            "_grokparsefailure"
          ],
          "host": {
            "hostname": "rocky-8-1"
          },
          "message": "",
          "event": {
            "original": ""
          },
          "data_stream": {
            "type": "logs",
            "dataset": "generic",
            "namespace": "default"
          }
        }
      },
      {
        "_index": ".ds-logs-generic-default-2022.08.31-000001",
        "_id": "3u0o9YIBuDGuteorJyfH",
        "_score": 1,
        "_source": {
          "@version": "1",
          "@timestamp": "2022-08-31T18:27:05.446261Z",
          "tags": [
            "_grokparsefailure"
          ],
          "host": {
            "hostname": "rocky-8-1"
          },
          "message": "",
          "event": {
            "original": ""
          },
          "data_stream": {
            "type": "logs",
            "dataset": "generic",
            "namespace": "default"
          }
        }
      },
      {
        "_index": ".ds-logs-generic-default-2022.08.31-000001",
        "_id": "3-0o9YIBuDGuteorJyfH",
        "_score": 1,
        "_source": {
          "user_agent": {
            "original": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
          },
          "@version": "1",
          "@timestamp": "2013-12-11T08:01:45Z",
          "url": {
            "original": "/xampp/status.php"
          },
          "host": {
            "hostname": "rocky-8-1"
          },
          "timestamp": "11/Dec/2013:00:01:45 -0800",
          "message": "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
          "event": {
            "original": "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
          },
          "source": {
            "address": "127.0.0.1"
          },
          "http": {
            "request": {
              "method": "GET",
              "referrer": "http://cadenza/xampp/navi.php"
            },
            "version": "1.1",
            "response": {
              "status_code": 200,
              "body": {
                "bytes": 3891
              }
            }
          },
          "data_stream": {
            "type": "logs",
            "dataset": "generic",
            "namespace": "default"
          }
        }
      },
      {
        "_index": ".ds-logs-generic-default-2022.08.31-000001",
        "_id": "4O0o9YIBuDGuteorJyfH",
        "_score": 1,
        "_source": {
          "@version": "1",
          "@timestamp": "2022-08-31T18:27:05.458044Z",
          "tags": [
            "_grokparsefailure"
          ],
          "host": {
            "hostname": "rocky-8-1"
          },
          "message": "",
          "event": {
            "original": ""
          },
          "data_stream": {
            "type": "logs",
            "dataset": "generic",
            "namespace": "default"
          }
        }
      }
    ]
  }
}

Logstash:

input { stdin { } }

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
    # more """correct"""
    #  |
    #  V
    #match => { "[event][original]" => '%{HTTPD_COMBINEDLOG}' }
  }

  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

Hint/Update:
What I noticed regarding the date issue from my initial post (configuration is based on a video course for ELK 6-7:

# Two digits like 01
match => [ "timestamp" ,  "dd/MMM/yyyy:HH:mm:ss Z" ]
# Single digit like 1 
match => [ "[apache][access][time]", "dd/MMM/yyyy:H:m:s Z"  ]

Perhaps that single malformated syntax caused all the headache ???

Yup did not notice Sep

Also why are you removing the message field I always leave that in while debugging

I don't know if you actually did exactly from the manual here

From the manual

NOTICE "%{COMBINEDAPACHELOG}" } not '%{HTTPD_COMBINEDLOG}'

input { stdin { } }

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

I ran that... and got the correct results

A not sure what you are actually doing but the sample code plus your sample logs = correct results all you need to do next is put in the File input

{
    "@timestamp" => 2017-09-20T13:31:04Z,
          "host" => {
        "hostname" => "hyperion"
    },
        "source" => {
        "address" => "57.85.164.98"
    },
          "http" => {
         "request" => {
              "method" => "GET",
            "referrer" => "https://codingexplained.com/"
        },
        "response" => {
                   "body" => {
                "bytes" => 7581
            },
            "status_code" => 200
        },
         "version" => "1.1"
    },
       "message" => "57.85.164.98 - - [20/Sep/2017:15:31:04 +0200] \"GET /favicon.ico HTTP/1.1\" 200 7581 \"https://codingexplained.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36\"",
      "@version" => "1",
    "user_agent" => {
        "original" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"
    },
     "timestamp" => "20/Sep/2017:15:31:04 +0200",
           "url" => {
        "original" => "/favicon.ico"
    },
         "event" => {
        "original" => "57.85.164.98 - - [20/Sep/2017:15:31:04 +0200] \"GET /favicon.ico HTTP/1.1\" 200 7581 \"https://codingexplained.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36\""
    }
}

I commented this part out. The reason why I want to use it is, that the current grok pattern:
states that "COMBINEDAPACHELOG" as a variable is deprecated: look at this git output:

HTTPDUSER %{EMAILADDRESS}|%{USER}
HTTPDERROR_DATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}

# Log formats
HTTPD_COMMONLOG %{IPORHOST:[source][address]} (?:-|%{HTTPDUSER:[apache][access][user][identity]}) (?:-|%{HTTPDUSER:[user][name]}) \[%{HTTPDATE:timestamp}\] "(?:%{WORD:[http][request][method]} %{NOTSPACE:[url][original]}(?: HTTP/%{NUMBER:[http][version]})?|%{DATA})" (?:-|%{INT:[http][response][status_code]:int}) (?:-|%{INT:[http][response][body][bytes]:int})
# :long - %{INT:[http][response][body][bytes]:int}
HTTPD_COMBINEDLOG %{HTTPD_COMMONLOG} "(?:-|%{DATA:[http][request][referrer]})" "(?:-|%{DATA:[user_agent][original]})"

# Error logs
HTTPD20_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[%{LOGLEVEL:[log][level]}\] (?:\[client %{IPORHOST:[source][address]}\] )?%{GREEDYDATA:message}
HTTPD24_ERRORLOG \[%{HTTPDERROR_DATE:timestamp}\] \[(?:%{WORD:[apache][error][module]})?:%{LOGLEVEL:[log][level]}\] \[pid %{POSINT:[process][pid]:int}(:tid %{INT:[process][thread][id]:int})?\](?: \(%{POSINT:[apache][error][proxy][error][code]?}\)%{DATA:[apache][error][proxy][error][message]}:)?(?: \[client %{IPORHOST:[source][address]}(?::%{POSINT:[source][port]:int})?\])?(?: %{DATA:[error][code]}:)? %{GREEDYDATA:message}
# :long - %{INT:[process][thread][id]:int}
HTTPD_ERRORLOG %{HTTPD20_ERRORLOG}|%{HTTPD24_ERRORLOG}

# Deprecated
COMMONAPACHELOG %{HTTPD_COMMONLOG}
COMBINEDAPACHELOG %{HTTPD_COMBINEDLOG}

OK, I just tried exactly this with your sample log it worked fine..

57.85.164.98 - - [20/Sep/2017:15:31:04 +0200] "GET /favicon.ico HTTP/1.1" 200 7581 "https://codingexplained.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"

input { stdin { } }

filter {
  grok {
    match => { "message" => "%{HTTPD_COMBINEDLOG}" }
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

results

{
    "@timestamp" => 2017-09-20T13:31:04Z,
        "source" => {
        "address" => "57.85.164.98"
    },
       "message" => "57.85.164.98 - - [20/Sep/2017:15:31:04 +0200] \"GET /favicon.ico HTTP/1.1\" 200 7581 \"https://codingexplained.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36\"",
     "timestamp" => "20/Sep/2017:15:31:04 +0200",
          "http" => {
         "request" => {
              "method" => "GET",
            "referrer" => "https://codingexplained.com/"
        },
        "response" => {
            "status_code" => 200,
                   "body" => {
                "bytes" => 7581
            }
        },
         "version" => "1.1"
    },
           "url" => {
        "original" => "/favicon.ico"
    },
    "user_agent" => {
        "original" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"
    },
      "@version" => "1",
         "event" => {
        "original" => "57.85.164.98 - - [20/Sep/2017:15:31:04 +0200] \"GET /favicon.ico HTTP/1.1\" 200 7581 \"https://codingexplained.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36\""
    },
          "host" => {
        "hostname" => "hyperion"
    }
}

I followed a video course that is out dated and used this syntax 1:1, which is wrong with current versions.

The date section was the issue...[apache][access][time] does not exist and also the wrong syntax in the formatting pattern.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.