I created logstash config to index data from MySQL DB in Elasticsearch by schedule.
Here is my config:
> input {
jdbc { jdbc_connection_string => "jdbc:mysql://mysql:3306/smartlikes_staging" jdbc_user => "root" jdbc_password => "" jdbc_driver_library => "jdbc-driver.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_page_size => 100000 jdbc_paging_enabled => true use_column_value => true tracking_column => "updatedat" tracking_column_type => "timestamp" schedule => "*/2 * * * *" last_run_metadata_path => "/configs/products/.logstash_jdbc_last_run" statement => "SELECT id, title, brand, description FROM products WHERE updatedAt > :sql_last_value ORDER BY id" } } filter { mutate { add_field => { "[@metadata][product_id]" => "%{id}" } } mutate { remove_field => ["id"] } } output { stdout { codec => json_lines } elasticsearch { "hosts" => "elasticsearch:9200" "index" => "products" "doc_as_upsert" => true "action" => "update" "document_type" => "_doc" "document_id" => "%{[@metadata][product_id]}" } }
The problem is:
When I run Logstash I expect that it will load all items from the DB match the query (including pagination) and will send them page after page to the ES. And it will happen every 2 minutes. But actually what I have is: it runs approximately each 5 minutes and sends only one page to the ES at once. So, if I start Logstash and there is "--- 2019-04-03 00:00:00.000000000 Z" in .logstash_jdbc_last_run file it loads first 100.000 items and sends them to the ES. The value in .logstash_jdbc_last_run has not changed. Then it starts again after 5 minutes and sends next 100.000 items to the ES.
When it finished, the value in .logstash_jdbc_last_run file has been successfully updated to the latest actual value from updatedAt column (that is ok).
It's not critically though. At least I can increase page size and make it run often. But seems something is going wrong here or my config is invalid.