AWS S3 logs --> SQS --> ELastic search

Hi Guys, we can see that the performance of sending logs from S3 --> Elastic SIEM is low compared to SQS. Elastic is doing enhancements in uploading data from S3 --> SIEM directly. However, I created a Lambda function subscribed to the S3 bucket to pull and post the files on SQS.
However, it's still not working as expected any Anyone who has done and tested it, please help :slight_smile:

Hello and welcome,

It is not sure what is your issue here and how it relates to any tool with Elastic stack.

Also, not sure what you mean here, peformance of what tool? How are you sending logs from S3 to Elastic?

Please provide more context of what you are trying to do with some Elastic tool and what is not working.

Elastic is working on the performance and reliability of the aws-s3 in polling mode.* The aws-s3 input works in two modes: polling and SQS. In polling mode, the aws-s3 input polls a file directly from an S3 bucket (S3 > input). In SQS mode, the S3 sends a notification to SQS, and the aws-s3 processes it (S3 > SQS > input)*. However, if the S3 bucket contains a large number of S3 objects, the SQS mode offers lower latency and higher scalability.

I am facing challenges with option 1 ( polling mode )** and switching to option 2 ( SQS), written functions to do so in AWS. However, there is very limited documentation from Elastic on the option2

Can you provide a link to this information?

It is still not clear what is your issue and how this relates to any Elastic tool.

Since you are talking about the aws-s3 input I'm assuming that you are talking about filebeat or elastic agent.

To get data from s3 buckets using the aws-s3 input it is recommended to use a SQS queue notifications instead of pooling the bucket, pooling the bucket is bad approach and I don't think that this can be improved in any significant way because pooling the bucket does not scale, so the recommendation is to use an SQS queue that receives notifications about the files in the bucket.

The way this work is the following:

  • You configure on AWS to send a notification to an SQS queue every time a new file is created in the s3 bucket you want to consume.
  • Filebeat or Elastic Agent will subscribe to the SQS queue and receive a notification every time a new file is added on the S3 bucket.
  • In the SQS notification there is information about the file that was added, filebeat or elastic agent will then download the file from the S3 bucket and process it.

The documentation on how this works on filebeat is here.

How you will create the SQS queue has nothing to do with Elastic, this is done entirely on AWS side, not sure what you are trying to do with that lambda function as S3 can send the notifications to SQS natively as described here.

The SQS notification has information about the files in the s3 bucket, not the content of the files, this way you can scale and have multiple filebeat/elastic agent subscribing to the same queue to know which files need to be downloaded and processed.

1 Like