Database table update best practices


#1

Hello,

I'm using ELK stack 5.1.1 on a linux machine.

I would like to build an aggregation of information from different Databases through ELK and query it to extract some statistics.
I started importing one table from a SqlServer DB.
Data are imported correctly through jdbc input plug-in.

I'm facing some issues trying to keep the index updated with the table.

I'm importing with the cfg script below:

    input {
      jdbc {
        type => "DataLake"
        tags => [ "DataLake_SQLSRV" ]
        jdbc_driver_library => "/etc/logstash/drivers/sqljdbc42.jar"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        jdbc_connection_string => "jdbc:sqlserver://XXXXX:1533;databaseName=YYYY"
        jdbc_user => "YYYY"
        jdbc_password => "YYYY987"
        statement => "SELECT * FROM YYYY.dbo.SRC"
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
        record_last_run => true
        last_run_metadata_path => "/var/log/logstash/jdbc_DataLake_YYYY_last_run"
    # cron every 15 minutes:
        schedule => "*/15 * * * *"
      }
    }
    filter {
    }
    output {
      elasticsearch {
        action => "index"
        hosts => ["oratst:9200"]
        index => "logstash-datalake"
        document_id => "%{csis}%{cprc}%{camb}%{csrc}%{tver}"
        document_type => "DataLake"
      }
      stdout { codec => rubydebug }
    }

What happens is that each time the scheduler fire a synchronization with the table the data on the ES index are first deleted and then re-imported from the table.
This causes two problems:

  1. The process is too slow and logically not efficient, can you suggest alternatives?
  2. The @timestamp of each row is updated at each scheduler run while i would have an update only for the updated records! is it possible?

Thanks in advance


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.