GZIP invalid header with Filebeat S3 input and GuardDuty logs

nhnicwaller · May 20, 2020, 11:13pm

I'm trying to ingest logs from AWS GuardDuty using Filebeat, but I'm getting ERROR messages when trying to run the ingest with Filebeat.

2020-05-20T22:42:27.973Z	ERROR	[s3]	s3/input.go:447	gzip.NewReader failed: gzip: invalid header
2020-05-20T22:42:27.974Z	ERROR	[s3]	s3/input.go:386	createEventsFromS3Info failed for AWSLogs/123456789123/GuardDuty/ca-central-1/2020/05/15/659b5608-a71c-3b42-8979-f851e61d9098.jsonl.gz: gzip.NewReader failed: gzip: invalid header
2020-05-20T22:42:27.974Z	WARN	[s3]	s3/input.go:277	Processing message failed, updating visibility timeout
2020-05-20T22:42:28.005Z	ERROR	[s3]	s3/input.go:447	gzip.NewReader failed: gzip: invalid header
2020-05-20T22:42:28.005Z	ERROR	[s3]	s3/input.go:386	createEventsFromS3Info failed for AWSLogs/123456789123/GuardDuty/ca-central-1/2020/05/14/1557fa6e-f5f8-36ea-add9-28070f1ff7ee.jsonl.gz: gzip.NewReader failed: gzip: invalid header

I have configured GuardDuty to export findings to an S3 bucket. The file contents are json-lines/newline-delimited JSON and they are GZIP-compressed. The metadata on the object is as follows:

Content-Encoding: gzip
Content-Type: application/json

Filebeat is configured using the s3 input plugin. I'm using Filebeat version 7.6.2. I found the line of code in Filebeat that's generating this error, but I can't figure out how to work around it.

If I download the file using aws s3 cp I can see that the file really is gzip-compressed, and it decompresses just fine on my local computer.

nhnicwaller · May 20, 2020, 11:28pm

Could it be that aws-sdk for Go is automatically decompressing the file? I found this issue #1292 with aws-sdk-go that says the default transport will decompress the object unless gzip is specified as an accepted encoding.

And filebeat is not specifying gzip as an accepted encoding when it calls GetObjectRequest() so maybe filebeat is trying to decompress data that has already been decompressed.

mtojek · May 21, 2020, 7:40am

You can try to delete the line with "Content-Encoding". The data that filebeat downloads is not valid GZIP content.

nhnicwaller · May 21, 2020, 6:06pm

@mtojek Not sure I understand your suggestion?

The object metadata is populated by AWS GuardDuty when it writes objects into the bucket, and Filebeat acts on the new object as soon as it is Put into the bucket because it subscribes to an SQS queue of bucket update notifications.

I could overwrite the metadata for one object, but that doesn't really help with the ingest flow. I can't change the Content-Encoding that is used on new objects created by GuardDuty.

nhnicwaller · May 21, 2020, 7:02pm

@mtojek I tested your theory by copying one of the files produced by GuardDuty into the same bucket, but with different metadata. This allowed Filebeat to ingest the file normally, and the events are visible in Kibana.

aws --profile root s3 cp --metadata Content-Encoding=faketest s3://my-guardduty-bucket/AWSLogs/123456789123/GuardDuty/ca-central-1/2020/05/14/017c8c8f-7ad6-3da5-9c53-8c08ba35b370.jsonl.gz s3://my-guardduty-bucket/AWSLogs/123456789123/GuardDuty/ca-central-1/2020/05/14/017c8c8f-7ad6-3da5-9c53-8c08ba35b370-3.jsonl.gz

I did not change the content of the file. So it seems like the data in S3 is valid GZIP content, but Filebeat will fail to process that GZIP content if the the header is Content-Encoding: gzip.

I'm still not sure how to make Filebeat work for ingesting GuardDuty logs. Should I have a Lambda that corrupts the Metadata on every object before delivering it to the queue for Filebeat? That does not seem good.

nhnicwaller · May 21, 2020, 8:09pm

I'm fairly sure this is a bug in the Filebeat S3 input plugin so I reported a bug on the Issue tracker on GitHub.

system · June 18, 2020, 8:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat reading logs from S3 Beats filebeat	11	1946	May 8, 2020
Filebeats 7.5.0 issue with s3 input with gzip files Beats filebeat	2	1195	January 3, 2020
Filebeat fails to parse GuardDuty json logs Beats filebeat	6	1041	October 14, 2020
S3 Filebeat Input Cloudfront Logs - handleSQSMessage failed: json unmarshal sqs message body failed Beats	3	631	August 6, 2020
Filebeat S3 Input - Output Garbled Beats filebeat	5	478	December 5, 2019

GZIP invalid header with Filebeat S3 input and GuardDuty logs

Related topics