flomickl
(Florian)
February 17, 2023, 4:31pm
1
Hi,
I have some historic tomcat log files like
172.x.x.xx - - [19/Dec/2022:23:59:58 +0100] "POST /url/text/json HTTP/1.1" 200 348
I have already a Logstash grok pattern but I have a problem with the correct timestamp.
The timestamp is for some reason type text and of course @timestamp date
input {
file {
path => "/path/localhost_access_log*.txt"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{IP:client_address} %{DATA:user} %{DATA:server} \[%{HTTPDATE:timestamp}\] \"%{WORD:request_method} %{URIPATH:url} %{DATA:content_type}\" %{NUMBER:status_code:int} %{NUMBER:duration:int}" }
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "access"
data_stream => "false"
}
stdout {codec => "rubydebug"}
}
"timestamp": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
One way would be to map timestamp to date and not to text, but replace it with @timestamp would be even better.
The historic timestamp should be the @timestamp .
Also it would be good if the index name has the additional tag of the historic timestamp
index => "access_[TIMESTAMP]"
Any ideas?
flomickl
(Florian)
February 17, 2023, 4:51pm
2
btw the result data is:
{
"@timestamp": [
"2023-02-17T16:45:29.998Z"
],
"@version": [
"1"
],
"@version.keyword": [
"1"
],
"client_address": [
"172.x.x.x"
],
"client_address.keyword": [
"172.x.x.x"
],
"content_type": [
"HTTP/1.1"
],
"content_type.keyword": [
"HTTP/1.1"
],
"duration": [
350
],
"event.original": [
"xxxx"
],
"event.original.keyword": [
"xxxx"
],
"host.name": [
"name"
],
"host.name.keyword": [
"name"
],
"log.file.path": [
"/path/localhost_access_log.2022-12-23.log"
],
"log.file.path.keyword": [
"/path/localhost_access_log.2022-12-23.log"
],
"message": [
"xxxx"
],
"message.keyword": [
"xxx"
],
"request_method": [
"POST"
],
"request_method.keyword": [
"POST"
],
"server": [
"-"
],
"server.keyword": [
"-"
],
"status_code": [
200
],
"timestamp": [
"23/Dec/2022:04:13:55 +0100"
],
"timestamp.keyword": [
"23/Dec/2022:04:13:55 +0100"
],
"url": [
"/xxx/json"
],
"url.keyword": [
"/xxx/json"
],
"user": [
"-"
],
"user.keyword": [
"-"
],
"_id": "C2REYIYB4iUevYI3PMeX",
"_index": "access",
"_score": null
}
Badger
February 17, 2023, 5:14pm
3
Use a date filter
date { match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ] }
will result in
"@timestamp" => 2022-12-23T03:13:55.000Z,
"timestamp" => "23/Dec/2022:04:13:55 +0100"
For the index, you could set the index option on the output to "access_%{+YYYY.MM.dd}", but see here for why I do not think that is a good idea.
flomickl
(Florian)
February 21, 2023, 8:40pm
4
ah thanks for the hint
I changed my logstash.conf now to the following setup:
input {
file {
path => "/path/localhost_access_log.2022-12-21.txt"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => "%{IP:client_address} %{DATA:user} %{DATA:server} \[%{HTTPDATE:timestamp}\] \"%{WORD:request_method} %{URIPATH:url} %{DATA:content_type}\" %{NUMBER:status_code:int} %{NUMBER:duration:int}"
}
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
}
mutate {
remove_field => ["message", "host", "file","user", "server"]
}
}
output {
stdout {codec => "rubydebug"}
if "_grokparsefailure" in [tags] {
# write events that didn't match to a file
file { "path" => "/path/grok_failures.txt" }
} else {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "access_logs"
data_stream => "false"
}
}
}
without the date filter, it runs through
with date filter it fails and gets
"tags":["_grokparsefailure"]
Do you see any reason why?
The pattern looks alright
21/Dec/2022:23:53:00 +0100
should be dd/MMM/YYYY:HH:mm:ss Z
like here descriped Date filter plugin | Logstash Reference [8.6] | Elastic
Badger
February 21, 2023, 8:53pm
5
No way to tell without seeing at least the [message] field on one the events where grok failed.
mutate { remove_field => ["message", "host", "file", "user", "server"] }
If you do not want [user] and [server] then do not name the patterns in the grok filter
"message" => "%{IP:client_address} %{DATA} %{DATA} ..."
Also, move the remove_field of the [message] field to the grok filter, so that if grok fails it will not be removed and you will see it in /path/grok_failures.txt
flomickl
(Florian)
February 21, 2023, 10:24pm
6
Hmm, I thought I posted the message in my first post.
The message is:
"message" => "172.x.x.x - - [21/Dec/2022:22:05:16 +0100] \"POST /path/json HTTP/1.1\" 200 347",
The timestamp
in Kibana discover is, if I do not use the date filter e.g.
"timestamp": [
"30/Dec/2022:23:59:30 +0100"
],
and the grok_failure output itself looks like:
{"event":{"original":"172.xxx.xx.xx - - [21/Dec/2022:23:59:48 +0100] \"GET /url HTTP/1.1\" 200 4491"},"@timestamp":"2023-02-21T20:24:21.573794119Z","tags":["_grokparsefailure"],"log":{"file":{"path":"/path/localhost_access_log.2022-12-21.txt"}},"@version":"1"}
Thanks for the additional hints as well!!!!
Badger
February 21, 2023, 11:57pm
7
The only thing I can think of is that if your URL has a query string then you should be matching URIPATHPARAM, not URIPATH.
system
(system)
Closed
March 22, 2023, 5:19pm
9
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.