Sorry about this - I managed to delete my previous post somehow!
My issue is that I have an gzip output like:
file {
path => "/var/data/logstash/archive-%{+YYYY-MM-dd}.log.gz"
gzip => true
}
The files created by that are transferred to S3 every day just after midnight. The issue is that the day after an archive is created and transferred, it's getting made again with a single log from the previous day. So for example, I'm getting a gzip file named 'archive-2015-05-22.log.gz', containing a single log with timestamp '2015-05-22T07:43:34.785Z' and received_at '2015-05-22 07:43:34 UTC' being created at approximately 2015-05-23 07:43:34 UTC (Note the changed day). When that new file is uploaded, it replaces the proper archive for that day.
In the previous post, someone replied suggesting I use the S3 output instead, which I will investigate, although I see know option (yet) to compress the output.
The logstash config is:
input {
lumberjack {
port => 6782
ssl_certificate => "..."
ssl_key => "..."
}
}
filter {
if [type] == "tomcat" {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:time} | %{GREEDYDATA:appname} | [%{GREEDYDATA:thread}] | %{LOGLEVEL:loglevel} ? | %{GREEDYDATA:class} | [%{GREEDYDATA:mdc}] | [%{GREEDYDATA:msg}]" ]
match => [ "message", "%{TIMESTAMP_ISO8601:time} | %{GREEDYDATA:appname} | [%{GREEDYDATA:thread}] | %{LOGLEVEL:loglevel} ? | %{GREEDYDATA:class} | %{GREEDYDATA:msg}" ]
match => [ "message", "%{TIMESTAMP_ISO8601:time} | %{GREEDYDATA:appname} | [%{GREEDYDATA:thread}] | %{LOGLEVEL:loglevel} ? | %{GREEDYDATA:class} | [%{GREEDYDATA:msg}]" ]
match => [ "message", "%{GREEDYDATA:msg}" ]
add_field => [ "facility", "tomcat" ]
}
multiline {
# If it doesnt start with a date join it to the previous line
pattern => "\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2},\d{3}\+\d{4} .*"
negate => "true"
what => "previous"
}
kv {
source => "mdc"
field_split => ", "
}
}
mutate {
uppercase => [ "loglevel" ]
add_field => [ "received_at", "%{@timestamp}" ]
add_tag => ["mutated"]
}
date {
match => [ "time", "ISO8601", "YYYY-MM-dd HH:mm:ss,SSS", "YYYY-MM-dd'T'HH:mm:ss,SSS", "MMM dd YYY HH:mm:ss" , "MMM d YYY HH:mm:ss", "dd/MMM/YYYY:HH:mm:ss Z", "YYYY/MM/dd HH:mm:ss" ]
target => "@timestamp"
add_tag => ["date_changed"]
}
}
output {
file {
path => "/var/data/logstash/archive-%{+YYYY-MM-dd}.log.gz"
gzip => true
}
elasticsearch {
cluster => "..."
node_name => "..."
host => "127.0.0.1"
port => "9300"
protocol => "transport"
}
}