Logstash + S3 Input plugin with High Availability

Pedro_Baldanta · November 3, 2023, 1:54pm

Hi all:

I need to implement high availability of Logstash reading log files from S3.

Is there any way to implement HA via scaleout without duplicating the events?

Each VM is going to store until which file has read, so I will have duplicated logs...

If I share the file (via NFS) that stores time of last processed file, both instantes will compete for same file and probably will compete for reading same files again.

How do you solve this problem?

Kind regards.

leandrojmp · November 3, 2023, 2:01pm

This is not possible using the Logstash S3 input.

You may be able to have that if you change to using Filebeat to get the logs, but you will also need to enable SQS notifications and use the Filebeat aws-s3 input to consume messages from SQS.

Another alternative is to uncouple the process of getting the logs and processing it, but this would need you to add other tools in your stack.

I have a scenario where I have multiple Logstash processing data from S3, but for this to work I have the following structure.

Custom Python Script to Download the Files -> Vectordev to read the Files and put the lines on Kafka Topics -> Multiple Logstash consuming from Kafka.

system · December 1, 2023, 2:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Having multiple logstash instances (more than one server) for reading from same s3 bucket using s3 input plugin Logstash	1	695	September 4, 2019
High Availibility for Logstash Input Processing Logstash	2	426	April 10, 2018
Logstash failover s3 Logstash	4	752	January 17, 2017
How to run multiple logstash instances for s3 input Logstash	7	5146	July 6, 2017
How to use Beat with Amazon S3 logs as input? Logstash	1	373	March 18, 2019

Logstash + S3 Input plugin with High Availability

Related topics