JDBC_Streaming Filter Plugin PoolTimeout Issue

Hello everyone. I am pulling data into logstash using the JDBC input plugin filter with a fetch size of 10,000 and for each of those records I am trying to perform another query to get child attributes for the parent document. The problem I am facing is that I keep receiving the following error from logstash:

[timestamp] [WARN ] [logstash.filters.jdbcstreaming] Exception when executing JDBC query {:exception=>#<Sequel::PoolTimeout: timeout: 5.0, elapsed: 5.001>}

Most of my documents are being tagged as having jdbc_streaming errors and they don't end up executing the queries as far as I can tell. How can I remedy this?

From an issue on the Sequel repo...

PoolTimeout is raised when Sequel needs a connection from the connection pool, but the pool already has the maximum number of created connections, and all of them are already checked out by other threads.
In any case, this doesn't appear to be a bug in Sequel, as Sequel is correctly raising the PoolTimeout exception after the configured amount of time has passed.

The pool size is set by the max_connections setting, defaults to 4. In the initial released version of the jdbc_streaming plugin this cannot be adjusted. You can try to reduce the number of LS workers to 4 though.

How long do your lookup queries take to execute?

These lookup queries are supposed to be fast single table, single record queries with the WHERE clause using an indexed column.

1 Like

Will try out reducing the amount of worker threads. Are there any plans in the future to allow more connections in this filter? We have a complex data use case and using the streaming as a full-fledged query service could help solve them

Well, yes, I will add the settings to increase the pool size over the next few weeks.

Can you improve the statement performance? Materialized View? Copy the table(s) to a simplified structure/record count with indexing.

What is the cardinality of the lookup identifier?

Ideally you want low cardinality for useful caching.

Bear in mind that this is the scenario:
Threads 1-4 uses Pool.1-4 to make lookup queries.
Thread 5-8 waits to get access to one of Pool.1-4.
5 seconds later...
Thread 5 gets a timeout exception.
Thread 1 returns Pool.1 to the pool.
Threads 6 get Pool.1 and makes the lookup query.

1 Like

Potentially in the future we can do these things, however for our use case right now we cannot as this is a POC project and my team does not own the tables we are querying from. The inner JDBC query is a join across three tables that generally takes less than 10s to return

That is way too long. Should be milliseconds!

Are the 3 tables static, or frequently updated (> once a day)?

Another option is to have 3 jdbc_streaming filters one for each table - with a field value added by the first jdbc_streaming filter used in the second jdbc_streaming filter etc. I have not tried 3 jdbc_streaming filters running in the same LS instance though.

If you have success with 3 filters, please let me know. Unfortunately I don't have the DB resources and data sets to test this for myself.

Cardinality?

If you think that is long, you need to see how slow Oracle can get :tired_face: The problem with the 3 streaming approach is we need the distinct combination of fields across three different tables. I am sure we could get this done client-side, however the reason we are using logstash for ingestion of this magnitude in the first place is to try and avoid creating programmatic solutions as we do this in other sectors and it is very costly.

The cardinality of the lookup query can be anywhere from 1:1 to 30:1. It varies on the input.

I get it.

One other design:
LS1: JDBC input (collect the three table join) -> ES (index: static3).
LS 2: JDBC input (original query) -> ES filter (index query on static3) -> ES (index: original index)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.