Logstash JDBC scheduling handling

Hi,

I'm using a multiple statement JDBC input with a 10 minute schedule like this:

input {
  jdbc {
    # DB connexion
    jdbc_connection_string => "jdbc:postgresql://myhost.fr:5432/icat"
    jdbc_user => "db_ro"
    jdbc_password => "secret"
    # JDBC driver
    jdbc_driver_library => "/nix/store/4m16yjh855l52vc6f1yibgxhmh7mcdmd-postgresql-jdbc-42.2.2/share/java/postgresql-jdbc.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    # Request
    statement => "select count(id) as users_count from user"
    # Schedule
    schedule => "*/10 * * * *"
  }
  jdbc {
    # DB connexion
    jdbc_connection_string => "jdbc:postgresql://myhost.fr:5432/icat"
    jdbc_user => "db_ro"
    jdbc_password => "secret"
    # JDBC driver
    jdbc_driver_library => "/nix/store/4m16yjh855l52vc6f1yibgxhmh7mcdmd-postgresql-jdbc-42.2.2/share/java/postgresql-jdbc.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    # Request
    statement => "select count(id) as objects_count from data"
    # Schedule
    schedule => "*/10 * * * *"
  }
  jdbc {
    # DB connexion
    jdbc_connection_string => "jdbc:postgresql://myhost.fr:5432/icat"
    jdbc_user => "db_ro"
    jdbc_password => "secret"
    # JDBC driver
    jdbc_driver_library => "/nix/store/4m16yjh855l52vc6f1yibgxhmh7mcdmd-postgresql-jdbc-42.2.2/share/java/postgresql-jdbc.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    # Request
    statement => "select count(*) as owner_count from (select distinct owner_name from coll) as temp"
    # Schedule
    schedule => "*/10 * * * *"
  }
  jdbc {
    # DB connexion
    jdbc_connection_string => "jdbc:postgresql://myhost.fr:5432/icat"
    jdbc_user => "db_ro"
    jdbc_password => "secret"
    # JDBC driver
    jdbc_driver_library => "/nix/store/4m16yjh855l52vc6f1yibgxhmh7mcdmd-postgresql-jdbc-42.2.2/share/java/postgresql-jdbc.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    # Request
    statement => "select sum(size) as used_space from data"
    # Schedule
    schedule => "*/10 * * * *"
  }
}

Occasionally the DB will stop replying to some requests (particularly the sum of size) and logstash seems to pile up subsequent requests without end. The other statements continue to function in the meantime but I have to kill logstash and restart it before getting any new values for the failed request.

Is it possible to prevent Logstash from piling up new requests before the reply to the previous request has been received for a statement (and how do we handle, for example, network outages in this case) ? Does the sql_last_value persistent state parameter do this? I don't think so from what I understand but I could be mistaken. Is there a way to achieve this in logstash or is it normally handled on the DB side by discarding requests taking too much time ?

Thanks

Can you offset the two cron lines?
0,10,20,30,40,50 * * * *
5,15,25,35,45,55 * * * *

The sql_last_value will not help in this case as it is used in a WHERE clause to get records later/greater than a previously stored value.

Another option is to use a CLI db script in a cron job to execute a stored procedure that takes a count and inserts a record in a "counts" table (with a timestamp or incrementing number) and then LS reads that table using sql last value. This will have the added bonus of LS being restartable without losing count updates.

Other than pgAgent, postgres does not seem to have an integral scheduler.

1 Like

Thanks for your reply.

I like the idea of using a script on the DB server to update a counts table that LS will then read, I'll look into that.

Cheers,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.