I'm seeing a problem with logstash-codec-cloudtrail where the processing just hangs without any error or debug logs when the codec encounters a large file.
Tried enabling debug logs for the codec, but nothing is printed:
curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.codecs.cloudtrail" : "DEBUG"}'
- Logstash version 5.5.1
- Codec Version: 3.0.4
- Operating System: Ubuntu 14.04
- Config File
s3 {
region => 'us-east-1'
bucket => '<my-org>-logs'
backup_to_bucket => '<my-org>-logs'
backup_add_prefix => 'processed/'
delete => true
interval => 300
tags => ['aws-input', 'cloudtrail']
type => 'cloudtrail'
codec => 'cloudtrail'
prefix => 'cloudtrail/'
sincedb_path => '/opt/logstash/server/sincedb/cloudtrail'
}
Sample Data:
Here's the list of files we have in the s3 bucket
2018-05-21 05:32:14 21408 20180521T0000Z_oueDeCc9ryuFaNE2.json.gz
2018-05-21 07:07:23 10581 20180521T0130Z_2C9gPDzKtmwp1sO3.json.gz
2018-05-21 07:04:22 5264114 20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz
2018-05-21 07:12:09 13128 20180521T0135Z_b9h4v5QqEkumMZNu.json.gz
2018-05-21 07:08:06 29622 20180521T0135Z_gY3u2wcdDT3DjPY9.json.gz
2018-05-21 07:08:05 42110 20180521T0135Z_uOFgvOohWqh7pCKm.json.gz
2018-05-21 07:07:13 42502 20180521T0140Z_2TX8v5UumEV24fgg.json.gz
2018-05-21 07:17:28 10593 20180521T0140Z_UQVPTdRJ7OGIpeQu.json.gz
2018-05-21 07:09:28 4841248 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz
2018-05-21 07:12:32 58228 20180521T0140Z_j8gNtuBoG91ftY6J.json.gz
2018-05-21 07:13:29 33323 20180521T0140Z_jBjTddHPURNw0wDp.json.gz
2018-05-21 07:17:43 45539 20180521T0145Z_28lYKm6deu5M9fPf.json.gz
2018-05-21 07:17:21 37363 20180521T0145Z_MuvtNRJAgTgjsIjq.json.gz
2018-05-21 07:12:22 5245924 20180521T0145Z_kCpHWvq3Hlua803U.json.gz
2018-05-21 07:22:40 12516 20180521T0145Z_kkJAyDaUNgv2LFLK.json.gz
2018-05-21 07:12:23 109264 20180521T0145Z_zrOp34x50ibxvQNT.json.gz
2018-05-21 07:16:04 5257312 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz
2018-05-21 07:17:25 252268 20180521T0150Z_CIrZORIB3WFCVN9s.json.gz
2018-05-21 07:21:08 3119643 20180521T0150Z_ERpgl6PvHjkY90QB.json.gz
At first, the sincedb was stuck at 01:34, and this file was seen in /tmp/logstash
20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz
, which is about 5MB.
There was no processing/logs seen beyond that timestamp for over 6 hours.
So, I stopped logstash and set the sincedb to 01:37, to skip that file.
After doing that, logstash was stuck on this file 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz
which is about 4MB.
This kept on going until I skipped this file from above list 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz
which is about 5MBs
Steps to Reproduce:
- Have the codec parse a file larger than 2MB
- Codec is hung
Please note:
- Other s3 inputs (elb and cloudfront logs) are functional in the same logstash instance.
- Filenames in above example have been simplified to emphasize timestamp and file sizes.