Input data:
Input data is a static file containing 10 lines of accesslogs.
If there is a way to share a file, please let me know, i can share the input file.
Config:
input {
file {
sincedb_path => "/opt/analytics/logstash/sincedb/da_neat.sincedb"
path => ["/opt/analytics/logstash/conf/neat_test"]
start_position => "beginning"
exclude => "*.gz"
type => "da_neat"
}
}
output {
file
{
codec => "plain"
path => "/opt/analytics/logstash/data/da_neat/da_neat02-%{+YYYY-MM-dd-HH}.json"
}
}
#1: With the above input and config i get 9 lines in output instead of 10. This is happening with any file. Not sure why it skips the last line always.
#2: After adding grok,urldecode and kv filters, i got 10 records which matches the input line count, which is good.
grok {
match => { "message" => "%{DATA:remote_addr} %{DATA:attr1} %{DATA:remote_user} [%{HTTPDATE:server_timestamp}] "%{DATA:attr2}" %{NUMBER:status} (?:%{NUMBER:body_bytes_sent}|-) "%{DATA:http_referer}" "%{DATA:http_user_agent}" "%{DATA:http_x_forwarded_for}" "%{DATA:request_body}"" }
}
kv { source => "request_body"
field_split => "&"
}
urldecode {
all_fields => "true"
}
#3: Then i added the date filter to match a key in the log. This results in first and seventh record missing in the output(total 8 records in output).
date {
match => [ "server_timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
#4: Added mutate to remove unwanted fields:
mutate {
remove_field => [ "message","@version","type","host","path","request_body","attr1", "attr2", "status", "body_bytes_sent", "http_x_forwarded_for" ]
}
This further decreased the output line count to 6.
Can someone please help me understand this behavior of logstash.
Thanks,
Mihir Ray