Hello,
I'm using ELK stack 5.1.1 on a linux machine.
I would like to build an aggregation of information from different Databases through ELK and query it to extract some statistics.
I started importing one table from a SqlServer DB.
Data are imported correctly through jdbc input plug-in.
I'm facing some issues trying to keep the index updated with the table.
I'm importing with the cfg script below:
input {
jdbc {
type => "DataLake"
tags => [ "DataLake_SQLSRV" ]
jdbc_driver_library => "/etc/logstash/drivers/sqljdbc42.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://XXXXX:1533;databaseName=YYYY"
jdbc_user => "YYYY"
jdbc_password => "YYYY987"
statement => "SELECT * FROM YYYY.dbo.SRC"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
record_last_run => true
last_run_metadata_path => "/var/log/logstash/jdbc_DataLake_YYYY_last_run"
# cron every 15 minutes:
schedule => "*/15 * * * *"
}
}
filter {
}
output {
elasticsearch {
action => "index"
hosts => ["oratst:9200"]
index => "logstash-datalake"
document_id => "%{csis}%{cprc}%{camb}%{csrc}%{tver}"
document_type => "DataLake"
}
stdout { codec => rubydebug }
}
What happens is that each time the scheduler fire a synchronization with the table the data on the ES index are first deleted and then re-imported from the table.
This causes two problems:
- The process is too slow and logically not efficient, can you suggest alternatives?
- The @timestamp of each row is updated at each scheduler run while i would have an update only for the updated records! is it possible?
Thanks in advance