Logstash jdbc_static filter plugin performing very poorly (nearly not at all)

Ugh, sorry, it's a tar.gz file

New link:

Looking at the thread dump, there are 32 worker threads for the "logs" pipeline.

"[logs]>worker0" is waiting to acquire a connection. So are threads 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 22, 24, 25, 26, 28, 29, 30, 31.

"[logs]>worker2" is basically doing the same, it is just on a different wait on the mutex (as a result of a race condition -- something put the connection back into the pool, and this thread was notified, but failed to get the mutex because another thread beat it to it -- that is not a problem). So are threads 16, 20, 21.

"[logs]>worker5" is down under org.apache.derby.impl.jdbc.EmbedStatement.executeQuery, down under logstash.filters.jdbc.lookup.RUBY$method$fetch$0, doing a lookup in a jdbc_static filter. So are threads 15, 23, 27.

All the worker threads have been up for 2,507 seconds and have each used around 124 seconds of CPU. That tells me that they are all processing events. The fact that most of the threads are waiting for connections tells me the lookup queries are much more expensive that they should be. I cannot think why.

It would be really nice to be able to enable the logging that the underlying jdbc code has so that we could see the query execution times, but I do not see a way to do that without modifying the jdbc_static code.

Thank you. I would be willing to try to enable the logging you refer to, but so far I have only rebuilt Logstash plugins to test certain hypotheses.

I'll look at the jdbc_static code tomorrow to see if I can get at that logging setting without having to rebuild the sequel gem.

I'm going to mark this as resolved since I have decided to move on to using the translate filter with a CSV file that is periodically generated from SQL calls. After testings, this approach reduced the amount of CPU used and it is processing about 8000 events per second for each Logstash instance.

1 Like

Another option might be to use the jdbc_streaming plugin. As far as I recall it does not rely on a local database which might mean your dataset with tuning might be held in memory as it is quite small.

Thank you, I did consider that, but was biased against it by the behavior of jdbc_static, where the absence of any field to use as a lookup would generate a warning message for each such event.

This produced millions of warnings which I then ignored with the appropriate logging settings.

Now after looking at the jdbc_streaming code, maybe that doesn't act the same as jdbc_static. Regardless, I'm going ahead with the "translate" filter for now because I don't have to write a lot of extra code.