I have configured GuardDuty to export findings to an S3 bucket. The file contents are json-lines/newline-delimited JSON and they are GZIP-compressed. The metadata on the object is as follows:
Filebeat is configured using the s3 input plugin. I'm using Filebeat version 7.6.2. I found the line of code in Filebeat that's generating this error, but I can't figure out how to work around it.
If I download the file using aws s3 cp I can see that the file really is gzip-compressed, and it decompresses just fine on my local computer.
Could it be that aws-sdk for Go is automatically decompressing the file? I found this issue #1292 with aws-sdk-go that says the default transport will decompress the object unless gzip is specified as an accepted encoding.
And filebeat is not specifying gzip as an accepted encoding when it calls GetObjectRequest() so maybe filebeat is trying to decompress data that has already been decompressed.
The object metadata is populated by AWS GuardDuty when it writes objects into the bucket, and Filebeat acts on the new object as soon as it is Put into the bucket because it subscribes to an SQS queue of bucket update notifications.
I could overwrite the metadata for one object, but that doesn't really help with the ingest flow. I can't change the Content-Encoding that is used on new objects created by GuardDuty.
@mtojek I tested your theory by copying one of the files produced by GuardDuty into the same bucket, but with different metadata. This allowed Filebeat to ingest the file normally, and the events are visible in Kibana.
I did not change the content of the file. So it seems like the data in S3 is valid GZIP content, but Filebeat will fail to process that GZIP content if the the header is Content-Encoding: gzip.
I'm still not sure how to make Filebeat work for ingesting GuardDuty logs. Should I have a Lambda that corrupts the Metadata on every object before delivering it to the queue for Filebeat? That does not seem good.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.