Filebeat aws readStringAndTrimDelimiter failed: context deadline exceeded in logs

Hi,

I'm running filebeat 7.9.0, sending logs to logstash 7.9.0 before they are handed over to ES 7.9.0. For filbeat, I have the AWS module configured to retrieve elblogs.
It took me a while to get it setup properly, i.e. for the SQS queue, not only ReceiveMessage, but also ChangeMessageVisibility and DeleteMessage seem to be needed. At least with these, no Permission errors in the logs anymore.

Lots of AWS ELB logs now show up in Kibana. However, I still fear, I miss quite a number of logs in Kibana.

In the filebeat logs, I see messages like this:

    Sep 24 09:14:43 hostname filebeat[7690]: 2020-09-24T09:14:43.409Z#011ERROR#011[s3]#011s3/input.go:487#011readStringAndTrimDelimiter failed: context deadline exceeded
    Sep 24 09:14:43 hostname filebeat[7690]: 2020-09-24T09:14:43.409Z#011ERROR#011[s3]#011s3/input.go:396#011createEventsFromS3Info failed processing file from s3 bucket "<my-bucket>" with name "path/to/filename.log": readStringAndTrimDelimiter failed: context deadline exceeded

and later followed by messages alike:

    Sep 24 09:14:52 hostname filebeat[7690]: 2020-09-24T09:14:52.898Z#011WARN#011[s3]#011s3/input.go:299#011Half of the set visibilityTimeout passed, visibility timeout needs to be updated
    Sep 24 09:14:52 hostname filebeat[7690]: 2020-09-24T09:14:52.901Z#011INFO#011[s3]#011s3/input.go:306#011Message visibility timeout updated to 500 seconds

Because I found this thread here, https://discuss.elastic.co/t/aws-vpcflow-errors-count-not-find-region-configuration-context-deadline-exceeded/225471

I already bumped the api_timeout to 500. (It would have been nice if the documentation would have mentioned that it's max at half of visibility_timeout and save me some head scratching why filebeat dies on start up, until I figured I have to bump visibility timeout as well...)

so when I download one of these failed files, and run wc -l I get a different number than what I get when I search in kibana:

event.module: aws AND event.dataset : aws.elb AND aws.s3.object.key : "path/to/filename.log"

for a particular example, I get wc -l 3114 lines, and kibana search like above only returns 2183 hits, where I expected it, should be the same, shouldn't it?

would I need to bump the api_timeout even higher? The node with filebeat runs in AWS same region as the SQS queue.

Sebastian

Hello! I would expect Filebeat to return all lines comparing with the original file. Yeah I would bump the visibility_timeout and api_timeout to give it a try. Also do you know how many SQS messages or logs you are ingesting? Another thing to try is to use multiple Filebeats running in parallel and point to the same SQS queue.

Regarding documentation: api_timeout is documented under S3 input https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-s3.html#_api_timeout_2 but you are right, definitely needs more improvement!

Hi,

thanks for the answer. Checking timestamps of last change of the files in S3, I see there are new logs popping up sometimes every few seconds, sometimes it might be up to a minutes.
So, I think not that many that filebeat should be overwhelmed.

As I understand, it gets a message from the SQS queue, that a new file is in S3, then fetches the file, and then ingests it. Therefore, I think the message receive and fetch works, it's just that the parsing of the log by the module is having a problem?
Therefore I don't believe if further bumping these timeouts will help.

Actually looking at todays logs, I see a number of these

Half of the set visibilityTimeout passed, visibility timeout needs to be updated
Message visibility timeout updated to 500 seconds

but only one of these

readStringAndTrimDelimiter

in the logs. So don't know if this may even be a red herring?

I did some other test, downloading a log file, and ingested it manually into a test cluster instance, just using file input. That worked fine, number of lines in the file was the same as the number of records that ended up in elasticsearch, etc. all fields parsed properly etc.

Also what I observed while looking at it, it doesn't seem to affect all files, and I found a file, where there was 1 record more in Elasticsearch than lines in the file. As it seems from timestamps, there was a last entry with an empty message. No idea where this comes from.

Sebastian