Logstash-codec-cloudtrail hung when parsing a large file


(Nikhil Owalekar) #1

I'm seeing a problem with logstash-codec-cloudtrail where the processing just hangs without any error or debug logs when the codec encounters a large file.

Tried enabling debug logs for the codec, but nothing is printed:

curl -XPUT 'localhost:9600/_node/logging?pretty' -H 'Content-Type: application/json' -d'{"logger.logstash.codecs.cloudtrail" : "DEBUG"}'
  • Logstash version 5.5.1
  • Codec Version: 3.0.4
  • Operating System: Ubuntu 14.04
  • Config File
    s3 {
      region => 'us-east-1'
      bucket => '<my-org>-logs'
      backup_to_bucket => '<my-org>-logs'
      backup_add_prefix => 'processed/'
      delete => true
      interval => 300
      tags => ['aws-input', 'cloudtrail']
      type => 'cloudtrail'
      codec => 'cloudtrail'
      prefix => 'cloudtrail/'
      sincedb_path => '/opt/logstash/server/sincedb/cloudtrail'
    }

Sample Data:

Here's the list of files we have in the s3 bucket

2018-05-21 05:32:14      21408 20180521T0000Z_oueDeCc9ryuFaNE2.json.gz
2018-05-21 07:07:23      10581 20180521T0130Z_2C9gPDzKtmwp1sO3.json.gz
2018-05-21 07:04:22    5264114 20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz
2018-05-21 07:12:09      13128 20180521T0135Z_b9h4v5QqEkumMZNu.json.gz
2018-05-21 07:08:06      29622 20180521T0135Z_gY3u2wcdDT3DjPY9.json.gz
2018-05-21 07:08:05      42110 20180521T0135Z_uOFgvOohWqh7pCKm.json.gz
2018-05-21 07:07:13      42502 20180521T0140Z_2TX8v5UumEV24fgg.json.gz
2018-05-21 07:17:28      10593 20180521T0140Z_UQVPTdRJ7OGIpeQu.json.gz
2018-05-21 07:09:28    4841248 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz
2018-05-21 07:12:32      58228 20180521T0140Z_j8gNtuBoG91ftY6J.json.gz
2018-05-21 07:13:29      33323 20180521T0140Z_jBjTddHPURNw0wDp.json.gz
2018-05-21 07:17:43      45539 20180521T0145Z_28lYKm6deu5M9fPf.json.gz
2018-05-21 07:17:21      37363 20180521T0145Z_MuvtNRJAgTgjsIjq.json.gz
2018-05-21 07:12:22    5245924 20180521T0145Z_kCpHWvq3Hlua803U.json.gz
2018-05-21 07:22:40      12516 20180521T0145Z_kkJAyDaUNgv2LFLK.json.gz
2018-05-21 07:12:23     109264 20180521T0145Z_zrOp34x50ibxvQNT.json.gz
2018-05-21 07:16:04    5257312 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz
2018-05-21 07:17:25     252268 20180521T0150Z_CIrZORIB3WFCVN9s.json.gz
2018-05-21 07:21:08    3119643 20180521T0150Z_ERpgl6PvHjkY90QB.json.gz

At first, the sincedb was stuck at 01:34, and this file was seen in /tmp/logstash
20180521T0135Z_7zhrUZGpPj8c9rnb.json.gz, which is about 5MB.

There was no processing/logs seen beyond that timestamp for over 6 hours.
So, I stopped logstash and set the sincedb to 01:37, to skip that file.

After doing that, logstash was stuck on this file 20180521T0140Z_ZV0HXfgBNseHi2cG.json.gz which is about 4MB.

This kept on going until I skipped this file from above list 20180521T0150Z_3KaopDSL1sGxg6vf.json.gz which is about 5MBs

Steps to Reproduce:

  • Have the codec parse a file larger than 2MB
  • Codec is hung

Please note:

  • Other s3 inputs (elb and cloudfront logs) are functional in the same logstash instance.
  • Filenames in above example have been simplified to emphasize timestamp and file sizes.

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.