Logstash file output creating wrong file?


(Sam Barham) #1

As well as outputting to ElasticSearch, I have Logstash outputting to a gzipped file with:
file {
path => "/var/data/logstash/archive-%{+YYYY-MM-dd}.log.gz"
gzip => true
}

I then have a cronjob that runs a bit after midnight every day and moves any archive files from that directory into an Amazon S3 bucket. So far so good.

The problem is that pretty much every day, a log record gets written into 'yesterdays' archive, after the archive has been transferred to S3. For example, a file 'archive-2015-05-22.log.gz' is created with a single log record with a timestamp of '2015-05-22T07:43:34.785Z' and a received_at '2015-05-22 07:43:34 UTC', but the file is created at about 2015-05-23 07:43:34 UTC (Note the changed day). This then gets uploaded to S3 and clobbers the previous archive file for that day.

I know I could 'solve' this by turning on versioning in the S3 bucket, so keeping both versions of the archives, but I'd much rather solve it.

Note that I was having a similar, but worse, problem because I was using an old version of logstash-forwarder, which was losing its progress on whole log files and resending them, which would create multiple archive files - one for each day the resent file had logs for. Upgrading to 0.4.0 has fixed that issue and either caused or revealed this issue.

A (truncated) copy of the logstash config file follows:

input {
lumberjack {
port => 6782
ssl_certificate => "..."
ssl_key => "..."
}
}

filter {

if [type] == "tomcat" {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:time} | %{GREEDYDATA:appname} | [%{GREEDYDATA:thread}] | %{LOGLEVEL:loglevel} ? | %{GREEDYDATA:class} | [%{GREEDYDATA:mdc}] | [%{GREEDYDATA:msg}]" ]
match => [ "message", "%{TIMESTAMP_ISO8601:time} | %{GREEDYDATA:appname} | [%{GREEDYDATA:thread}] | %{LOGLEVEL:loglevel} ? | %{GREEDYDATA:class} | %{GREEDYDATA:msg}" ]
match => [ "message", "%{TIMESTAMP_ISO8601:time} | %{GREEDYDATA:appname} | [%{GREEDYDATA:thread}] | %{LOGLEVEL:loglevel} ? | %{GREEDYDATA:class} | [%{GREEDYDATA:msg}]" ]
match => [ "message", "%{GREEDYDATA:msg}" ]
add_field => [ "facility", "tomcat" ]
}

multiline {
  # If it doesnt start with a date join it to the previous line
  pattern => "\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2},\d{3}\+\d{4} .*"  
  negate => "true"
  what => "previous"
}

kv {
  source        => "mdc"
  field_split => ", "
}

}

mutate {
uppercase => [ "loglevel" ]
add_field => [ "received_at", "%{@timestamp}" ]
add_tag => ["mutated"]
}

date {
match => [ "time", "ISO8601", "YYYY-MM-dd HH:mm:ss,SSS", "YYYY-MM-dd'T'HH:mm:ss,SSS", "MMM dd YYY HH:mm:ss" , "MMM d YYY HH:mm:ss", "dd/MMM/YYYY:HH:mm:ss Z", "YYYY/MM/dd HH:mm:ss" ]
target => "@timestamp"
add_tag => ["date_changed"]
}
}

output {

file {
path => "/var/data/logstash/archive-%{+YYYY-MM-dd}.log.gz"
gzip => true
}

elasticsearch {
cluster => "..."
node_name => "..."
host => "127.0.0.1"
port => "9300"
protocol => "transport"
}

}


File gzip output creating wrong file?
(Mark Walkom) #2

Why not just use the S3 output and save yourself a step?


(Sam Barham) #3

Last time I looked at S3 output it wasn't production ready, I think, but I'll definitely take a look now. Do you know if it's possible to compress it's output?


(system) #4