I am trying to move to logstash from the elasticsearch-jdbc river to run a query on a table with 100M+ rows but it is extremely slow with logstash even after tuning the parameters and running it on 32 threads. I had to use paging as even with it set to 24GB memory I will get an out of memory error. I am using the official mysql jdbc jar and the logstash config is below. Logstash is putting in about 1k rows per minute and elasticsearch-jdb river puts in almost a 1 million per minute on a 16 core 32GB memory machine. The difference is huge. Also CPU usage on logstash is extremely low, it's barely using any.
input {
jdbc {
jdbc_driver_library => "/root/mysql-connector-jdbc/mysql-connector-java-5.1.39-bin.jar"
jdbc_connection_string => "jdbc:mysql://localhost:3306/table"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_user => "root"
jdbc_password => "PASSWORD"
jdbc_page_size => 100000
jdbc_paging_enabled => true
statement => "SELECT * FROM a"
}
}
output {
elasticsearch {
index => "test"
document_type => "test"
document_id => "%{id}"
hosts => "localhost"
flush_size => 512
workers => 32
}
}