How can I have multiple concurrent jdbc_streaming filter instances?

I have a config using the jdbc input plugin, plus 3 instances of jdbc_streaming filters to gather data I need to create an event. It all works, in terms of getting the events into elasticsearch as I want them to be. However, Logstash is only making one connection for each of the jdbc_streaming instances, seemingly regardless of what pipeline.workers is set to. I end up with 3 persistent connections for the filters, plus one transient one for the input, whether pipeline.workers is set to 4 or 10.

I want to send more concurrent filter queries to speed up processing of large batches of events. I've been unable to find any documentation that might explain the behavior here. Can anybody clear things up for me?

The logstash-filter-jdbc_streaming plugin uses the ruby sequel library under the hood, which does support connection pooling, but does not provide a way to configure connection pooling.

I've filed an issue on the project: logstash-plugins/logstash-filter-jdbc_streaming#9; feel free to chime in over there with specifics.

Thanks for that info, and I appreciate you doing that.

For my own understanding, is it fair to say that the effect of the pipeline.workers setting is dependent on which filters are actually configured? Based on the docs, with a pipeline.workers setting of 10 for my config (jdbc input, 3 jdbc_streaming, 1 elasticsearch output), I would have expected either 8 or 6 (a multiple of 3) workers each with a single connection to the db.

Each pipeline worker thread takes a batch of events off this queue, runs the batch of events through the configured filters, and then runs the filtered events through any outputs.

Or, is it that a configured filter is a separate thread that a worker interacts with, rather than a "part" of a given worker's processing? So in my case, I'd have 8 workers interacting with the 3 filter threads.

Each pipeline worker doesn't get its own independent copy of each plugin; each filter is a shared resource for all of the workers in the pipeline.

In many cases, the filters are stateless and rely on no shared resources (e.g., grok, mutate, etc.), but when filters do have shared resources (such as a database connection), there can be contention for those resources that prevents all workers from using the filter simultaneously.

In this case, the jdbc_streaming filter has a connection pool under the hood to enable the worker threads to share available connections, but that pool isn't configured, so a single connection is being shared between all of the worker threads.

Got it, that's very helpful to understand. Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.