Elastic Agent, aws-s3-default-aws-s3-vpcflow keeps failing

I have an AWS environment which ships VPC logs to a S3 bucket. I am using the AWS VPC Log Processing integration within Elastic Agent to process these logs and to monitor for new logs.

It connects to the bucket successfully.

When the agent starts up, you can see the CPU utilisation go high and stay high for considerable periods of time and the memory utilisation within Filebeat will also climb to a very high amount, so it is definitely doing something.

I leave it for a couple of days, but nothing appears in my elasticsearch cluster.

When I look at the logs for the elastic agent in question, I am seeing repeated messages such as:

00:17:22.551
elastic_agent
[elastic_agent][info] Component state changed aws-s3-default (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '5056' exited with code '2'
00:17:22.558
elastic_agent
[elastic_agent][info] Unit state changed aws-s3-default (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '5056' exited with code '2'
00:17:22.558
elastic_agent
[elastic_agent][info] Unit state changed aws-s3-default-aws-s3-vpcflow-97ad6888-efd5-400e-8ca3-0a799ec519d6 (HEALTHY->STOPPED): Suppressing FAILED state due to restart for '5056' exited with code '2'
00:17:23.755
elastic_agent
[elastic_agent][info] Spawned new component aws-s3-default: Starting: spawned pid '4012'
00:17:23.756
elastic_agent
[elastic_agent][info] Spawned new unit aws-s3-default-aws-s3-vpcflow-97ad6888-efd5-400e-8ca3-0a799ec519d6: Starting: spawned pid '4012'
00:17:23.756
elastic_agent
[elastic_agent][info] Spawned new unit aws-s3-default: Starting: spawned pid '4012'
00:17:28.689
elastic_agent
[elastic_agent][info] Component state changed aws-s3-default (STARTING->HEALTHY): Healthy: communicating with pid '4012'

When I check on the agent regularly, I can see different PID for filebeat which would confirm it is starting and restarting.

I have a feeling the issue may be because the S3 bucket in question already has a considerable amount of VPC logs within it, and this is causing filebeat to potentially max out memory utilisation and then fail.

I have tried increase memory but it ends the same way.

I don't want to have to create a new S3 bucket.

Would love any suggestions on how to get these logs and monitor for new ones for the VPC logs.

Are you using it combined with SQS or directly pointing to the bucket?

Directly pointing to the bucket - no SQS

Yeah, without the SQS the input will need to get a list of all the objects in the bucket and depending on the number of objects this can be really expensive.

For old logs I would suggest that you set prefix by year and month until you process everything, but for new logs the best option is to use SQS.

Thanks for that - looks like the way I will go.

It is a shame the prefix does not support wildcards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.