How can I have multiple concurrent jdbc_streaming filter instances?

jdmcalee · March 15, 2018, 5:20pm

I have a config using the jdbc input plugin, plus 3 instances of jdbc_streaming filters to gather data I need to create an event. It all works, in terms of getting the events into elasticsearch as I want them to be. However, Logstash is only making one connection for each of the jdbc_streaming instances, seemingly regardless of what pipeline.workers is set to. I end up with 3 persistent connections for the filters, plus one transient one for the input, whether pipeline.workers is set to 4 or 10.

I want to send more concurrent filter queries to speed up processing of large batches of events. I've been unable to find any documentation that might explain the behavior here. Can anybody clear things up for me?

yaauie · March 15, 2018, 6:11pm

The logstash-filter-jdbc_streaming plugin uses the ruby sequel library under the hood, which does support connection pooling, but does not provide a way to configure connection pooling.

I've filed an issue on the project: logstash-plugins/logstash-filter-jdbc_streaming#9; feel free to chime in over there with specifics.

jdmcalee · March 15, 2018, 6:25pm

Thanks for that info, and I appreciate you doing that.

For my own understanding, is it fair to say that the effect of the pipeline.workers setting is dependent on which filters are actually configured? Based on the docs, with a pipeline.workers setting of 10 for my config (jdbc input, 3 jdbc_streaming, 1 elasticsearch output), I would have expected either 8 or 6 (a multiple of 3) workers each with a single connection to the db.

Each pipeline worker thread takes a batch of events off this queue, runs the batch of events through the configured filters, and then runs the filtered events through any outputs.

Or, is it that a configured filter is a separate thread that a worker interacts with, rather than a "part" of a given worker's processing? So in my case, I'd have 8 workers interacting with the 3 filter threads.

yaauie · March 15, 2018, 7:28pm

Each pipeline worker doesn't get its own independent copy of each plugin; each filter is a shared resource for all of the workers in the pipeline.

In many cases, the filters are stateless and rely on no shared resources (e.g., grok, mutate, etc.), but when filters do have shared resources (such as a database connection), there can be contention for those resources that prevents all workers from using the filter simultaneously.

In this case, the jdbc_streaming filter has a connection pool under the hood to enable the worker threads to share available connections, but that pool isn't configured, so a single connection is being shared between all of the worker threads.

jdmcalee · March 15, 2018, 7:30pm

Got it, that's very helpful to understand. Thank you.

system · April 12, 2018, 7:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logasth jdbc connection Logstash	1	260	May 28, 2020
JDBC_Streaming Filter Plugin PoolTimeout Issue Logstash	8	1665	February 13, 2018
How to run queries from multiple JDBC inputs sequentially Logstash	4	4179	April 18, 2017
Logstash jdbc_static filter plugin performing very poorly (nearly not at all) Logstash	27	2591	February 18, 2020
Do multiple jdbc input run simultaneously or sequentially? Logstash	4	1526	January 17, 2019

How can I have multiple concurrent jdbc_streaming filter instances?

Related topics