Issue using Logstash JDBC Input Plugin with Multiple Dynamic Tables

Hi All !

I'm implementing a Logstash solution to fetch events from MSSQL database with the relevant data present:

  • In a fixed set of well known database tables with fixed columns;

  • BUT also have the most important data in a dynamic number of additional tables - that will be created in the DB at any time - and those tables will also have a dynamic number of columns (some of them are fixed and known, the rest can vary in number and name);

The set of well known database tables with fixed fields don't present a challenge in Logstash - I have several JDBC Inputs present to deal with each table, each one with a simple SQL statement defined and numeric column tracking, and works like a charm as expected.

The challenge presents itself with the dynamic tables, both in number and in number of columns in them.

I already have a block of SQL code that works very well on generating data from those dynamic tables, but I'm unable to use it with a JDBC Input using a statement_path configuration, also with numeric column tracking.

It seems that the SQL code is too complex to use together with statement_path, it simply generates a JDBC Exception and no data is fetch...

Also, it seems that the :sql_last_value is not being passed as numeric but as a timestamp, which is also strange...

Am I missing something?
How this challenge can be solved?

The SQL code is the following (for MSSQL):

DECLARE @TableName nvarchar(1024)
DECLARE @DynSQL nvarchar(MAX)
DECLARE cursor1 CURSOR FOR
    select name
    from sys.tables
    where name like 'tablename_pattern_%'

Set @DynSQL = ''
OPEN cursor1
FETCH NEXT FROM cursor1 INTO @TableName

WHILE @@FETCH_STATUS = 0
BEGIN
    Set @DynSQL = @DynSQL + 'SELECT *  FROM ' + @TableName
    FETCH NEXT FROM cursor1
    INTO @TableName
    If @@FETCH_STATUS = 0
    BEGIN
        Set @DynSQL = @DynSQL + ' WHERE [id] > :sql_last_value ; '
    END
END

CLOSE cursor1
DEALLOCATE cursor1

EXEC sp_executesql @DynSQL

Some advances...

Created an Stored Procedure with the above SQL Code, but with statement using EXECUTE, I'm getting an JDBC exception:

[2018-06-06T18:23:01,008][ERROR][logstash.inputs.jdbc ] Java::ComMicrosoftSqlserverJdbc::SQLServerException: Incorrect syntax near the keyword 'EXECUTE'.: SELECT TOP (1) count(*) AS [COUNT] FROM (EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N'0') AS [T1]

Some more advances...
At it seems we have here a Logstash BUG !

Created another file with the execution of the Stored Procedure:

EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N':sql_last_value'

and referenced it with statement_filepath, BUT I'm getting the same JDBC Exception:

[ERROR][logstash.inputs.jdbc ] Java::ComMicrosoftSqlserverJdbc::SQLServerException: Incorrect syntax near the keyword 'EXECUTE'.: SELECT TOP (1) count(*) AS [COUNT] FROM (EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N'0') AS [T1]

WHEN I USE --log.level=debug !

If I use --log.level=error the statement is executed...

So what gives?

This is a well known problem (to the LS developers at least).

When set to debug logging, we log out a count of the records we are going to fetch.
However, as we internally don't build the SQL, we get it from you and we don't have a universal SQL parser for all dialects, so we wrap your statement as a sub select to execute the count statement.

SELECT TOP (1) count(*) AS [COUNT] FROM (
EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N'0'
) AS [T1]

The first and last lines ^ are the wrapper. Your statement is on line 2.
Mostly this works quite well, but occasionally the statement will not parse.

One alternative we are considering is to give you the option of supplying a count_statement setting - however you would have to ensure that the result was a single row with one column called COUNT or count having a integer datatype.

Ok, I see...

So without debug logging I would probably could use directly the MSSQL T-SQL Code with statement_filepath, without needing to make a Store Procedure, right?

I'm having now one more issue...
That code return several sets of results (one different for each table), but I'm only getting some of them (of some tables, others do not appear).

What can be causing this?

Its worth a test and report back of your results - for others to read :slight_smile:

I can't recall anyone returning multiple result sets and the input plugin handling it properly.

Maybe investigate multiple pipelines with a JDBC Input in each that loads from a specific Stored Procedure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.