Could it be that aws-sdk for Go is automatically decompressing the file? I found this issue #1292 with aws-sdk-go that says the default transport will decompress the object unless gzip is specified as an accepted encoding.
And filebeat is not specifying gzip as an accepted encoding when it calls GetObjectRequest() so maybe filebeat is trying to decompress data that has already been decompressed.
The object metadata is populated by AWS GuardDuty when it writes objects into the bucket, and Filebeat acts on the new object as soon as it is Put into the bucket because it subscribes to an SQS queue of bucket update notifications.
I could overwrite the metadata for one object, but that doesn't really help with the ingest flow. I can't change the Content-Encoding that is used on new objects created by GuardDuty.
@mtojek I tested your theory by copying one of the files produced by GuardDuty into the same bucket, but with different metadata. This allowed Filebeat to ingest the file normally, and the events are visible in Kibana.
I did not change the content of the file. So it seems like the data in S3 is valid GZIP content, but Filebeat will fail to process that GZIP content if the the header is Content-Encoding: gzip.
I'm still not sure how to make Filebeat work for ingesting GuardDuty logs. Should I have a Lambda that corrupts the Metadata on every object before delivering it to the queue for Filebeat? That does not seem good.