S3 input getting stuck and not deleting SQS messages

Hi,

I'm with filebeat 7.10.0 on Ubuntu.

I've configured filebeat to retrieve files from S3 bucket alike:

filebeat.inputs:
      - type: s3
        enabled: true
        queue_url: "https://sqs.us-west-2.amazonaws.com/ACCOUNT_ID/SQS-QUEUE-NAME"
        access_key_id: "REDACTED"
        secret_access_key: "REDACTED"
        max_number_of_messages: 1
        tags: ['guardduty']
        json:
          keys_under_root: False

and I use logstash output alike:

output.logstash:
  hosts: ["127.0.0.1:5044"]
  pipelining: 0
  bulk_max_size: 0
  ssl.enabled: true
  ssl.verification_mode: none
  ssl.certificate_authorities: ["/etc/filebeat/ca-chain.pem"]

running filebeat with debug, it seems to fetch messages from SQS, as well as processing S3 objects and publishing them:

grep filebeat syslog | grep -e DEBUG -e WARN -e INFO | grep -v monitor
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.428Z#011INFO#011[input.s3]#011s3/collector.go:196#011Message visibility timeout updated to 300 seconds#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.428Z#011INFO#011[input.s3]#011s3/collector.go:196#011Message visibility timeout updated to 300 seconds#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.429Z#011INFO#011[input.s3]#011s3/collector.go:196#011Message visibility timeout updated to 300 seconds#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.429Z#011INFO#011[input.s3]#011s3/collector.go:196#011Message visibility timeout updated to 300 seconds#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:124#011Processing 10 messages#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/742964380662/GuardDuty/us-west-2/2021/03/10/2e0e5c08-9a21-32f9-aee5-781def981972.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/ACCOUNTID/GuardDuty/us-west-2/2021/03/10/67265ce9-bfeb-3d29-826a-44eda4b26503.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/ACCOUNTID/GuardDuty/us-west-2/2021/03/10/77a5fef0-ca1a-3100-ae7e-2692051d6066.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/742964380662/GuardDuty/us-west-2/2021/03/10/727ed3c2-5339-346d-abc2-90f3905710da.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/820937518675/GuardDuty/us-west-2/2021/03/10/2e0e5c08-9a21-32f9-aee5-781def981972.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaw s.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/681260632498/GuardDuty/us-west-2/2021/03/10/7f6b93db-a90b-3f68-aed0-7d6177aeb97f.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/820937518675/GuardDuty/us-west-2/2021/03/10/eab3104f-ee93-3a7f-b070-e6c082f1a25a.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.446Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.446Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/ACCOUNTID/GuardDuty/us-west-2/2021/03/10/56ca7e42-3d3c-316c-b598-7181fa070203.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.446Z#011DEBUG#011[input.s3]#011s3/collector.go:146#011handleSQSMessage succeed and returned 1 sets of S3 log info#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.446Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/ACCOUNTID/GuardDuty/us-west-2/2021/03/10/79ce7cb3-d6fc-3cc6-a180-301074af2c6b.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.445Z#011DEBUG#011[input.s3]#011s3/collector.go:302#011Processing file from s3 bucket "MYBUCKET" with name "AWSLogs/742964380662/GuardDuty/us-west-2/2021/03/10/c350a5ea-d962-3072-bb66-200073c28456.jsonl.gz"#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.531Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.532Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.532Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.532Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.533Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.533Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.533Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.533Z#011DEBUG#011[input.s3]#011s3/collector.go:154#011handleS3Objects succeed#011{"queue_url": "https://sqs.us-west-2.amazonaws.com/ACCOUNTID/MYBUCKET", "region": "us-west-2"}
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.535Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.535Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.535Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.536Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {
    Mar 12 12:35:38 loghost filebeat[26366]: 2021-03-12T12:35:38.536Z#011DEBUG#011[processors]#011processing/processors.go:203#011Publish event: {

but then after a few minutes of processing, it seems to get stuck, even there are messages visible in the queue. When I restart filebeat, it starts again fetching messages, processing S3 object and publishing events to Logstash, and eventually I end up with lots of duplicates in Elasticsearch.

I thought maybe, the logstash output is not properly giving feedback that messages processed properly, so played a bit with its options. When I added these two parameters to the logstash output:

      pipelining: 0
      bulk_max_size: 0

It eventually even deleted a few messages from the SQS queue, however, not all, and in the end, it got stuck again.
Regarding the message visibility timeout warnings, I found this here:

in case it's related, it should be fixed.

Now I'm kind of clueless, how to get it right.

Sebastian

meanwhile updated the stack to 7.11.2, but I see the same behaviour.

I got curious, if that is the combination of s3 input and logstash output.
For testing I replaced logstash output with elasticsearch output, and
all the visible messages from SQS were ingested properly and got deleted
as they should. Logs properly show up as expected in Kibana.

Given that this is working as expected when you send data directly from Filebeat to Elasticsearch, but not when the data is going from Filebeat to Logstash to Elasticsearch, I would focus on Logstash first. Are there any errors or warnings in the Logstash logs? Maybe post your Logstash pipeline here as well?

Also, I'm going to move this post to the Logstash forum category for now. We can always bring it back to the Beats category if the Logstash troubleshooting reveals this is an issue on the Beats side.

Hi @shaunak

because I see successful publish events to LS in filebat, very much the same I see when just ingesting normal files, my thinking was that the filebeat logstash output may not give feedback to the S3 input, that it can delete the message from the SQS queue, whereas that feedback cycle works with the ES output. Anyways, I don't know how internally that works in filebeat.

I've two more test scenarios I'll go through tomorrow:

  • download the log files from the S3 bucket and just point filebeat file input at them, and send them via logstash
  • test the logstash S3 input

if both don't work as expected, it's probably an issue with my pipeline, as you suggest, if one or both work, that will hopefully help pinpointing the issue.

I'll get back with results.

cheers

Hi @shaunak

so I did the two test:

  1. using file input with filebeat
  • downloaded a number of files from s3
  • gunzip all these files
  • ingest files with file input module of filebeat
  1. using s3 input of logstash
    • disable s3 filebeat for a moment to not interfere
    • point logstash s3 input to the s3 buckets

Both methods work like a charm, all log lines are ingested in ES without issues.
I did not had to fiddle with the logstash pipleine regarding parsing of these specific log type.
The first test worked witout touching logstash pipeline at all, the second test, only had to include the s3 input filter, and adapt the logic a bit, so that the log parsing kicks in appropriately.

So I believe my initial thinking still holds, there's something odd between filebeat S3 input and logstash output.

can this be moved back under beats, as I think it's not a logstash problem?