Historic tomcat access logs transform historic timestamp to @timestamp

Hi,

I have some historic tomcat log files like
172.x.x.xx - - [19/Dec/2022:23:59:58 +0100] "POST /url/text/json HTTP/1.1" 200 348

I have already a Logstash grok pattern but I have a problem with the correct timestamp.
The timestamp is for some reason type text and of course @timestamp date

input {
  file {
    path => "/path/localhost_access_log*.txt"
    start_position => "beginning"
  }
}


filter {
    grok {
      match => { "message" => "%{IP:client_address} %{DATA:user} %{DATA:server} \[%{HTTPDATE:timestamp}\] \"%{WORD:request_method} %{URIPATH:url} %{DATA:content_type}\" %{NUMBER:status_code:int} %{NUMBER:duration:int}" }
    }
}


output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "access"
    data_stream => "false"
  }
  stdout {codec => "rubydebug"}
}

 "timestamp": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },

One way would be to map timestamp to date and not to text, but replace it with @timestamp would be even better.

  1. The historic timestamp should be the @timestamp.
  2. Also it would be good if the index name has the additional tag of the historic timestamp

index => "access_[TIMESTAMP]"

Any ideas?

btw the result data is:

{
  "@timestamp": [
    "2023-02-17T16:45:29.998Z"
  ],
  "@version": [
    "1"
  ],
  "@version.keyword": [
    "1"
  ],
  "client_address": [
    "172.x.x.x"
  ],
  "client_address.keyword": [
    "172.x.x.x"
  ],
  "content_type": [
    "HTTP/1.1"
  ],
  "content_type.keyword": [
    "HTTP/1.1"
  ],
  "duration": [
    350
  ],
  "event.original": [
    "xxxx"
  ],
  "event.original.keyword": [
    "xxxx"
  ],
  "host.name": [
    "name"
  ],
  "host.name.keyword": [
    "name"
  ],
  "log.file.path": [
    "/path/localhost_access_log.2022-12-23.log"
  ],
  "log.file.path.keyword": [
    "/path/localhost_access_log.2022-12-23.log"
  ],
  "message": [
    "xxxx"
  ],
  "message.keyword": [
    "xxx"
  ],
  "request_method": [
    "POST"
  ],
  "request_method.keyword": [
    "POST"
  ],
  "server": [
    "-"
  ],
  "server.keyword": [
    "-"
  ],
  "status_code": [
    200
  ],
  "timestamp": [
    "23/Dec/2022:04:13:55 +0100"
  ],
  "timestamp.keyword": [
    "23/Dec/2022:04:13:55 +0100"
  ],
  "url": [
    "/xxx/json"
  ],
  "url.keyword": [
    "/xxx/json"
  ],
  "user": [
    "-"
  ],
  "user.keyword": [
    "-"
  ],
  "_id": "C2REYIYB4iUevYI3PMeX",
  "_index": "access",
  "_score": null
}

Use a date filter

 date { match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ] }

will result in

"@timestamp" => 2022-12-23T03:13:55.000Z,
 "timestamp" => "23/Dec/2022:04:13:55 +0100"

For the index, you could set the index option on the output to "access_%{+YYYY.MM.dd}", but see here for why I do not think that is a good idea.

ah thanks for the hint

I changed my logstash.conf now to the following setup:

input {
  file {
    path => "/path/localhost_access_log.2022-12-21.txt"
    start_position => "beginning"
  }
}


filter {
    grok {
      match => {
        "message" => "%{IP:client_address} %{DATA:user} %{DATA:server} \[%{HTTPDATE:timestamp}\] \"%{WORD:request_method} %{URIPATH:url} %{DATA:content_type}\" %{NUMBER:status_code:int} %{NUMBER:duration:int}"
      }
    }

    date {
       match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
    }

    mutate {
        remove_field => ["message", "host", "file","user", "server"]
    }
}


output {
     stdout {codec => "rubydebug"}

   if "_grokparsefailure" in [tags] {
      # write events that didn't match to a file
      file { "path" => "/path/grok_failures.txt" }
   } else {
     elasticsearch {
       hosts => ["http://localhost:9200"]
       index => "access_logs"
       data_stream => "false"
     }
   }
}

without the date filter, it runs through
with date filter it fails and gets
"tags":["_grokparsefailure"]

Do you see any reason why?
The pattern looks alright

21/Dec/2022:23:53:00 +0100 should be dd/MMM/YYYY:HH:mm:ss Z

like here descriped Date filter plugin | Logstash Reference [8.6] | Elastic

No way to tell without seeing at least the [message] field on one the events where grok failed.

mutate { remove_field => ["message", "host", "file", "user", "server"] }

If you do not want [user] and [server] then do not name the patterns in the grok filter

"message" => "%{IP:client_address} %{DATA} %{DATA} ..."

Also, move the remove_field of the [message] field to the grok filter, so that if grok fails it will not be removed and you will see it in /path/grok_failures.txt

Hmm, I thought I posted the message in my first post.

The message is:

 "message" => "172.x.x.x - - [21/Dec/2022:22:05:16 +0100] \"POST /path/json HTTP/1.1\" 200 347",

The timestamp
in Kibana discover is, if I do not use the date filter e.g.

  "timestamp": [
    "30/Dec/2022:23:59:30 +0100"
  ],

and the grok_failure output itself looks like:

{"event":{"original":"172.xxx.xx.xx - - [21/Dec/2022:23:59:48 +0100] \"GET /url HTTP/1.1\" 200 4491"},"@timestamp":"2023-02-21T20:24:21.573794119Z","tags":["_grokparsefailure"],"log":{"file":{"path":"/path/localhost_access_log.2022-12-21.txt"}},"@version":"1"}

Thanks for the additional hints as well!!!!

The only thing I can think of is that if your URL has a query string then you should be matching URIPATHPARAM, not URIPATH.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.