Poor performance - SQS / S3

I'm using elastic agents on k8s - managed with elastic operator.

I have very poor performance for ingesting flow logs via SQS from S3.
In the debug logs for elastic agent I can see a lot of logs like this:

[elastic_agent.filebeat][debug] Incoming log.file.path value: https://<BUCKET>.s3.us-east-2.amazonaws.com/AWSLogs/<ACCOUNT>/vpcflowlogs/us-west-2/2024/09/24/11/<ACCOUNT>_vpcflowlogs_us-west-2_fl-0319a7b3559f337e9_20240924T1140Z_da0963c2.log.gz

this log is visible 100s times - pointing to the same file... is this expected?

1 Like

From Elasticsearch to Elastic Agent

What is the number of maximum concurrent sqs messages that you are using? The default value os 5 and is pretty low for high rate logs like Cloudtrail, VPC logs etc.

Is it exactly the same file? Can you share some evidence of this? Not sure when this is logged.

What is the number of maximum concurrent sqs messages that you are using? The default value os 5 and is pretty low for high rate logs like Cloudtrail, VPC logs etc.

I use default settings, tried with 20 but didn't see any difference except connections being reset by S3

Is it exactly the same file? Can you share some evidence of this? Not sure when this is logged.

Yes, when I search agent logs with following query:

elastic_agent.id:d03ec10c-5ec5-410b-a3c5-04b443318f49 and (data_stream.dataset:elastic_agent or data_stream.dataset:elastic_agent.filebeat) and "0319a7b3559f337e9_20240924T1140Z_da0963c2"

it found more than 10k results

It is not clear what this query will return, you didn't share it nor you shared multiple log lines from the file.

But, the way that AWS send its logs is to send multiple events inside a json object named records, and then the Elastic Agent will unnest this object into multiple events, so you will have multiple logs in Elasticsearch where the source file is the same.

So if you have multiple different events in Kibana showing the same log.file.path, this is expected.

But how are you using now? 5 or 20? How many agents do you have consuming data from the same queue?

VPC Flow logs can be pretty noise, you may need multiple agents consuming it in paralllel.

Also, how is your output configured? You probably need to change it to use the optimized for throughput performance settings.

I checked one of the flowlog files and it had 40k row so that can justify I see this log entry 100s times.

But how are you using now? 5 or 20? How many agents do you have consuming data from the same queue?
VPC Flow logs can be pretty noise, you may need multiple agents consuming it in paralllel.

Right now I have 20 agents with default settings (5) - I can see average one agent consumes 200-250 events per sec.

Also, how is your output configured? You probably need to change it to use the optimized for throughput performance settings.

Can I change it when using basic license? I had it greyed out when tried to change in my agent policy settings.

Yes, it should work with the basic license can you share a screenshot? Did you define your Fleet Elasticsearch output in kibana.yml?

I managed to change Default profile - but it didn't have any impact on the number of events.

I also added more agents up to 60 but it also didn't have any impact.

Maybe limitation is on the elasticsearch side?