How logstash is working?

guyboertje · September 25, 2018, 3:22pm

I have done quite a bit of development on the jdbc input.

You turned on jdbc_paging_enabled but the default for the page size is config :jdbc_page_size, :validate => :number, :default => 100000
The jdbc_fetch_size you set is 4000 and is set in the driver, meaning it takes 25 trips to the DB before a page of 100000 is built and only once the page is built in LS memory can the events start being added to the work queue. While this is happening the workers are idle.

How many cores to you have? this affects the number of workers LS sets as default. The pipeline batch size is 125 so for 8 workers the in-flight batches is 1000. While the workers are off filtering and outputting these 1000 events, jdbc input has to wait for the workers to collect more events from the queue (effectively a synchronous queue structure).

Outputting to Elasticsearch is also synchronous, meaning a poorly tuned ES cluster can limit LS throughput.

My suggestion is that, to minimise the idle waiting, you need to make sure that jdbc_input is off fetching records while the workers are filtering and outputting i.e. each part is doing something and not waiting on the other. In practice, setting jdbc_fetch_size and jdbc_page_size to workers X 125 might hammer the DB too much - so some multiple of that number could be as good. You may get more throughput if you upped the batch_size to 200 and changed the other numbers accordingly.

Hope this helps.

Topic		Replies	Views
Terribly long time to load data from MySQL Logstash	10	944	October 16, 2018
Mysql JDBC Extremely Slow Logstash	1	1127	July 6, 2017
Loading Elasticsearch via Logstash on large dataset is going very slowly Logstash	1	584	June 8, 2018
How to load big table to es with logstash Logstash	6	3011	October 5, 2017
Loading large table from Mysql to ES via Logstash Logstash	8	787	February 17, 2023

How logstash is working?

Related topics