AWS/CloudTrail Integration for Elastic Agent

Hello! I am using the default overarching AWS integration for Elastic Agent in order to collect CloudTrail logs from S3. I am successfully collecting logs, but I have a HUGE dataset and it is still processing logs from September even after I enabled it yesterday.

Does the integration read from when the trail first started and work towards real time? Is there a way to fix it so it only cares about new events from X start date? Would increasing CPU/RAM requests on my ingest nodes make it go faster?

TLDR: I have the CloudTrail integration within the AWS integration working successfully, but it will be stuck for days catching up to get to real time.

I'm assuming you configured it to do polling on the s3 bucket with your cloudtrail logs, right?

If so, it will process all files in the bucket, depending on the number of files this can take a really long time as polling from s3 can be pretty slow.

I don't think so, you can try to increase the number of workers in the integration to see if this improves.

Another thing is, depending on the amount of events you get in your Cloudtrail, you will never be able to have it in realtime using polling mode as it does not scale, the recommendation is to use SQS notifications, this allows you to have multiple agents consuming the data in the case of a high rate cloudtrail bucket.

But SQS notifications only works from the moment they were configured, it does not work for data that it is already in the bucket.

This was very helpful, thank you!

Should I use the dedicated CloudTrail Integration you think? Would I just have to enable event notification to SQS in my Cloudtrail bucket and then provide the queue name to the integration?

It is the same integration, the difference is if you add the AWS Cloudtrail integration, it will show you only the cloudtrail settings to configure, after you install it if you go to edit you will see all other AWS integrations as disabled.

You need to configure notifications from your cloudtrail s3 bucket to a sqs queue, and then use this queue in the configuration in Elastic Agent.

But this works for objects created after the notification was configured.

You cannot have both configured, so I recommend that first you wait for it to process the old data, if you do not care for old data, then just disable it and configure to use the SQS notifications.

Perfect, thank you so much! The main goal is to ingest Cloudtrail logs in real time as much as possible. It definitely looks like the route you mentioned with SQS is the best way to achieve that!

Yeah, using SQS notifications is the best approach as you can scale the number of agents.

I think this is the AWS documention: Walkthrough: Configuring a bucket for notifications (SNS topic or SQS queue) - Amazon Simple Storage Service

1 Like

If you are using elastic cloud you could also explore sending data using firehose, see Amazon Kinesis Data Firehose overview | Amazon Kinesis Data Firehose Ingest Guide | Elastic

This is a great option as it really simplifies the architecture.

Otherwise, if you're not in cloud I would echo the previous recommendation and strongly encourage using SQS + S3. To improve throughput I would recommend following the recommendations here Get the most from Elastic Agent with Amazon S3 and SQS | Elastic Blog which recommend using the "throughput" performance preset (see more info here: Using Elastic Agent Performance Presets in 8.12 | Elastic Blog) under fleet > settings > output