i'm using in this days Logstash (I love logstash if I could I would use it to make coffee too ), like an ETL get the data from mysql transform it enriching the data with multiple jdbc_streaming [the same database (in some cases I manipulates the data)] and put it somewhere.
I have a single logstash instance with 4 core and 16GB RAM I use 8 for Xms and Xmg . In order to move 18Milion rows (pagination 500'000)logstash take 24 hours, i run the config with 4 workers and 1000 batch.size .
In another use case I moved 27Milion rows (pagination 100'000) of another table with similar filters using the same options (workers and batch.size) and the same instance in this case logstash spent 12 hours in order to complete the moving.
On source db the workload is fine. I don't understand why the behavior is different. Is smaller pagination preferred? I've to use persistent queues?
A dirty solution could be splitting the resultset in 2 and run the 2 pipelines at the same time, but i think there would be a better solution than this.