Many hours are spent, many neurons are lost, a number of gray hairs are earned... All this in the battle with sql_last_value
refreshment in .logstash_jdbc_last_run
file.
My configuration:
MS SQL Server database table --> Logstash --> ElasticSearch --> Kibana
What is already done successfully:
Transfer of all data from MS SQL Server database table to ElasticSearch via Logstash, with further possibility to visualize these data in Kibana.
Scheduling is also successfull.
Where i got stuck:
Trying to make Logstash to deliver only new records from the SQL table (i.e. trying to configure incremental deliveries).
My logstash.conf file:
input
{
jdbc
{
jdbc_driver_library => "D:\ELK\SQL_driver\sqljdbc_6.0\enu\jre8\sqljdbc42.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://123.456.78.90:1433;databaseName=DB_NM"
jdbc_user => "user"
jdbc_password => "password"
schedule => "*/5 * * * *"
statement => "
SELECT
[SKID]
,[ClientFullName]
,[CreatedAt]
,convert(bigint, [VersionStamp]) as [VersionStamp]
FROM [dbo].[DM_Clients]
WHERE [SKID] > :sql_last_value
ORDER BY [SKID]"
use_column_value => true
tracking_column_type => "numeric"
tracking_column => "SKID"
type => "clients"
clean_run => false
record_last_run => true
}
}
filter
{
mutate { remove_field => ["@timestamp", "@version"] }
}
output
{
stdout { codec => json_lines }
elasticsearch
{
hosts => "localhost:9220"
manage_template => false
action => "index"
index => "%{type}"
document_type => "%{type}"
document_id => "%{skid}"
}
}
My clients.mapping
file:
{
"clients":
{
"properties":
{
"skid": {"index": "not_analyzed", "type": "long"},
"clientfullname": {"index": "not_analyzed", "type": "string"},
"createdat": {"index": "not_analyzed", "type": "date"},
"versionstamp": {"index": "not_analyzed", "type": "long"}
}
}
}
What is happening:
Transfer by schedule works well. It completes for around 3 minutes. But always all data are transferred instead of records with new SKID only.
I have checked created .logstash_jdbc_last_run
file after Logstash finish and have found --- 0
value there. And after a number of transfer attempts by schedule this value is still the same.
I tried to start from scratch i.e. delete index "clients" in ES, delete mapping, create index and mapping once again and start Logstash. The picture is the same. Once finished, the value in .logstash_jdbc_last_run
is --- 0
.
Also, I tried to experiment with datetime column named "CreatedAt". And configured this field to be a "timestamp" for Logstash. Result in created .logstash_jdbc_last_run
file is the same but in datetime format i.e. 1970-01-01 00:00:00.000000000 Z
.
What am I doing wrong?
Also, in SQL Server there is a special field type called "timestamp". Example of data: 0x0000000001068B33. I have such column in my source table: Clients.VersionStamp
And this column makes possible to monitor not only new records but updated records also because SQL Server refreshes "timestamp" field independently. So, new and updated records can be captured by the value in the field of "timestamp" type. My final goal is to use data form Clients.VersionStamp for incremental deliveries. And I'm going to convert Clients.VersionStamp to bigint in SQL statement placed into logstash.conf. And to track Clients by bigint i.e. numeric
value. Is it possible to track in this way or not?