I'm implementing a Logstash solution to fetch events from MSSQL database with the relevant data present:
In a fixed set of well known database tables with fixed columns;
BUT also have the most important data in a dynamic number of additional tables - that will be created in the DB at any time - and those tables will also have a dynamic number of columns (some of them are fixed and known, the rest can vary in number and name);
The set of well known database tables with fixed fields don't present a challenge in Logstash - I have several JDBC Inputs present to deal with each table, each one with a simple SQL statement defined and numeric column tracking, and works like a charm as expected.
The challenge presents itself with the dynamic tables, both in number and in number of columns in them.
I already have a block of SQL code that works very well on generating data from those dynamic tables, but I'm unable to use it with a JDBC Input using a statement_path configuration, also with numeric column tracking.
It seems that the SQL code is too complex to use together with statement_path, it simply generates a JDBC Exception and no data is fetch...
Also, it seems that the :sql_last_value is not being passed as numeric but as a timestamp, which is also strange...
Am I missing something?
How this challenge can be solved?
The SQL code is the following (for MSSQL):
DECLARE @TableName nvarchar(1024)
DECLARE @DynSQL nvarchar(MAX)
DECLARE cursor1 CURSOR FOR
select name
from sys.tables
where name like 'tablename_pattern_%'
Set @DynSQL = ''
OPEN cursor1
FETCH NEXT FROM cursor1 INTO @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
Set @DynSQL = @DynSQL + 'SELECT * FROM ' + @TableName
FETCH NEXT FROM cursor1
INTO @TableName
If @@FETCH_STATUS = 0
BEGIN
Set @DynSQL = @DynSQL + ' WHERE [id] > :sql_last_value ; '
END
END
CLOSE cursor1
DEALLOCATE cursor1
EXEC sp_executesql @DynSQL
Created an Stored Procedure with the above SQL Code, but with statement using EXECUTE, I'm getting an JDBC exception:
[2018-06-06T18:23:01,008][ERROR][logstash.inputs.jdbc ] Java::ComMicrosoftSqlserverJdbc::SQLServerException: Incorrect syntax near the keyword 'EXECUTE'.: SELECT TOP (1) count(*) AS [COUNT] FROM (EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N'0') AS [T1]
and referenced it with statement_filepath, BUT I'm getting the same JDBC Exception:
[ERROR][logstash.inputs.jdbc ] Java::ComMicrosoftSqlserverJdbc::SQLServerException: Incorrect syntax near the keyword 'EXECUTE'.: SELECT TOP (1) count(*) AS [COUNT] FROM (EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N'0') AS [T1]
WHEN I USE--log.level=debug !
If I use --log.level=error the statement is executed...
This is a well known problem (to the LS developers at least).
When set to debug logging, we log out a count of the records we are going to fetch.
However, as we internally don't build the SQL, we get it from you and we don't have a universal SQL parser for all dialects, so we wrap your statement as a sub select to execute the count statement.
SELECT TOP (1) count(*) AS [COUNT] FROM (
EXECUTE [dbo].[sp_FetchDynaTablesEntries] @TrackColumn = N'id', @LastValue = N'0'
) AS [T1]
The first and last lines ^ are the wrapper. Your statement is on line 2.
Mostly this works quite well, but occasionally the statement will not parse.
One alternative we are considering is to give you the option of supplying a count_statement setting - however you would have to ensure that the result was a single row with one column called COUNT or count having a integer datatype.
So without debug logging I would probably could use directly the MSSQL T-SQL Code with statement_filepath, without needing to make a Store Procedure, right?
I'm having now one more issue...
That code return several sets of results (one different for each table), but I'm only getting some of them (of some tables, others do not appear).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.