Hi,
I am relatively "new" to ELK, I have some basics in administration and use, but I don't feel like a specialist in any way. I will briefly describe the problem I am struggling with.
On one of our services (hosted on AWS) we have suspiciously high traffic from several addresses, to facilitate the analysis of historical logs from ELB (elastic load balancer) I decided to "load" them into the ELK cluster. As long as I upload a single file manually, everything is fine, the structure is recognized correctly and I can analyze it in Kibana. Worse, from a few days I have almost 4,000 log files (over 1.5 million logs). So I am trying to add these files automatically.
Here is the first problem when I try to configure Filebeat (8.1.1) to read data directly from the S3 bucket I am getting the error:
Input "aws-s3" failed: query s3 failed to initialize: failed to get AWS region for bucket: request canceled, context canceled
Config for this part:
filebeat.inputs:
- type: aws-s3
enabled: true
default_region: eu-central-1
bucket_arn: arn:aws:s3:::elb-access-logs
number_of_workers: 5
bucket_list_interval: 300s
aws_access_key_id: super_secret_key
aws_secret_access_key: super_secret_secret
credential_profile_name: my_work_profile
expand_event_list_from_field: Records
Keys are correct, aws-cli works fine with these credentials:
❯ aws s3api get-bucket-location --bucket elb-access-logs
{
"LocationConstraint": "eu-central-1"
}
I was looking for a solution, among others on this forum and neither of them worked for me.
I decided to download the logs to a local disk and load them locally, but it looks like I'm doing something wrong because filebeat literally does nothing: /
- type: filestream
enabled: true
paths:
- /path_to_my_downloaded_logs/elasticloadbalancing/eu-central-1/*.log
In the second case, the only thing that comes to my mind is that the problem is scattering the logs in the subdirectories with the date eg. ./eu-central-1/2022/03/14/
but I assumed (maybe wrongly) that the crawler should check all subdirectories. If not, how do you enable it?
P.S. Yes, I know, I can connect S3 to SQS and download data from there with ready integration, and I will probably do it in the future, but for now I have a problem with historical logs that are too many