I am trying to import a large database from a remote server using the JDBC input plugin and have been having some issues. After testing some various configs, I am seeing that if I set LIMIT 10 in the statement
itself, I will get the results back very fast. However, I see there is a parameter called jdbc_page_size
that takes care of limits and offsets so that a statement like SELECT *
can be broken up and you can get your results one chunk at a time (at least that's how I understand it?). When I remove the LIMIT part of the statement and set jdbc_page_size => 10
to test it out, Logstash outputs:
Dec 01 17:36:59 elastisearch systemd[1]: Started logstash.
Dec 01 17:37:07 elastisearch logstash[24787]: Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
and then hangs. I have yet to see it actually output anything at all.
Here is my config. Am I doing something wrong?
Thank you
input {
jdbc {
jdbc_driver_library => "/home/ubuntu/postgresql-42.1.4.jar"
jdbc_driver_class => "org.postgresql.Driver"
jdbc_connection_string => "jdbc:postgresql://hostname.com:5432/dbname?user=\
myusername&password=mypassword"
jdbc_user => "myusername"
jdbc_password => "mypassword"
statement => "SELECT * from tablename"
jdbc_paging_enabled => "true"
jdbc_page_size => "100"
#jdbc_fetch_size => "5"
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
hosts => ["localhost"]
index => "db-name"
}
}
It finally started. Just took an extra long time. Can someone explain why doing a LIMIT in the statement returns it super fast? Just curious what it's actually doing differently under the hood.