I need to implement high availability of Logstash reading log files from S3.
Is there any way to implement HA via scaleout without duplicating the events?
Each VM is going to store until which file has read, so I will have duplicated logs...
If I share the file (via NFS) that stores time of last processed file, both instantes will compete for same file and probably will compete for reading same files again.
How do you solve this problem?
This is not possible using the Logstash S3 input.
You may be able to have that if you change to using Filebeat to get the logs, but you will also need to enable SQS notifications and use the Filebeat
aws-s3 input to consume messages from SQS.
Another alternative is to uncouple the process of getting the logs and processing it, but this would need you to add other tools in your stack.
I have a scenario where I have multiple Logstash processing data from S3, but for this to work I have the following structure.
Custom Python Script to Download the Files -> Vectordev to read the Files and put the lines on Kafka Topics -> Multiple Logstash consuming from Kafka.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.