Running Logstash on multiple servers, avoiding double processing

BenSeb · July 6, 2023, 10:28am

Hi

We have logstash running on our worker servers, and they run with an identical config, to ensure if one hosts goes down, we are still processing events.

The source data is Mysql, then logstash pushes the latest records to Elasticsearch

eg

input {
  jdbc {
    type => event_v4
    clean_run => false
    jdbc_driver_library => "/usr/share/java/mysql-connector-java.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://xx"
    jdbc_user => "xx"
    jdbc_password => "xx"
    jdbc_paging_enabled => true
    jdbc_page_size => 10000
    schedule => "*/5 * * * *"
    statement =>"xx"

We then schedule the logstash config every 5 mins.

Ideally I want to avoid doubling up the work, ie both servers picking up the same records and pushing them up - it doesn't cause issues but it's a waste of resources. Because we use Ansible to build the servers It's hard to set the cron template differently on each.

I was thinking that once the logstash job runs, we could update the mysql records with a 'lastindexed' date, then in the input/jdbc params, exclude anthing where that date is within the last few mins

eg:

WHERE lastUpdated > :sql_last_value AND lastIndexed < :sql_last_value

Is that possible, how can we update that column in mysql once the records are indexed (we currently only read from Mysql and write to Elastic)

Or any better way to handle this scenario?

Wave · July 10, 2023, 3:00pm

Hi @BenSeb,

I have a similar use case as you, and while I don't have an exact solution do have some pointers. We use the user_column_value in our query so if a host goes down it can pick up where it left off. Only have one host pulling from the database, but setup a watcher to let me know if the data flow stops for any reason. If that happens it's usually easy to see what the problem is and again because the jdbc input is keeping track of where it left off is pretty seamless.

system · August 7, 2023, 3:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
.logstash_jdbc_last_run file is read only on logstash start up Logstash	4	402	November 19, 2020
Same data keeps on adding which results in duplicate data while using jdbc to fetch data from the mysql server Logstash	2	625	February 28, 2019
How :sql_last_value works with jdbc_page_size? Logstash	1	787	May 6, 2019
Getting the same values multiple times while integrating Mysql with Logstash Logstash	2	207	November 8, 2022
Prevent Logstash from stopping Logstash	3	760	July 6, 2017

Running Logstash on multiple servers, avoiding double processing

Related topics