Using LIMIT on JDBC input as opposed to jdbc_page_size

I am trying to import a large database from a remote server using the JDBC input plugin and have been having some issues. After testing some various configs, I am seeing that if I set LIMIT 10 in the statement itself, I will get the results back very fast. However, I see there is a parameter called jdbc_page_size that takes care of limits and offsets so that a statement like SELECT * can be broken up and you can get your results one chunk at a time (at least that's how I understand it?). When I remove the LIMIT part of the statement and set jdbc_page_size => 10 to test it out, Logstash outputs:

Dec 01 17:36:59 elastisearch systemd[1]: Started logstash.
Dec 01 17:37:07 elastisearch logstash[24787]: Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties

and then hangs. I have yet to see it actually output anything at all.

Here is my config. Am I doing something wrong?

Thank you

input {
  jdbc {
    jdbc_driver_library => "/home/ubuntu/postgresql-42.1.4.jar"
    jdbc_driver_class => "org.postgresql.Driver"
    jdbc_connection_string => "jdbc:postgresql://hostname.com:5432/dbname?user=\
myusername&password=mypassword"
    jdbc_user => "myusername"
    jdbc_password => "mypassword"
    statement => "SELECT * from tablename"
    jdbc_paging_enabled => "true"
    jdbc_page_size => "100"
    #jdbc_fetch_size => "5"                                                                                                  
  }
}
output {
  stdout {
    codec => rubydebug
  }
  elasticsearch {
    hosts => ["localhost"]
    index => "db-name"
  }
}

It finally started. Just took an extra long time. Can someone explain why doing a LIMIT in the statement returns it super fast? Just curious what it's actually doing differently under the hood.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.