Database table update best practices

Hello,

I'm using ELK stack 5.1.1 on a linux machine.

I would like to build an aggregation of information from different Databases through ELK and query it to extract some statistics.
I started importing one table from a SqlServer DB.
Data are imported correctly through jdbc input plug-in.

I'm facing some issues trying to keep the index updated with the table.

I'm importing with the cfg script below:

    input {
      jdbc {
        type => "DataLake"
        tags => [ "DataLake_SQLSRV" ]
        jdbc_driver_library => "/etc/logstash/drivers/sqljdbc42.jar"
        jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
        jdbc_connection_string => "jdbc:sqlserver://XXXXX:1533;databaseName=YYYY"
        jdbc_user => "YYYY"
        jdbc_password => "YYYY987"
        statement => "SELECT * FROM YYYY.dbo.SRC"
        jdbc_paging_enabled => "true"
        jdbc_page_size => "50000"
        record_last_run => true
        last_run_metadata_path => "/var/log/logstash/jdbc_DataLake_YYYY_last_run"
    # cron every 15 minutes:
        schedule => "*/15 * * * *"
      }
    }
    filter {
    }
    output {
      elasticsearch {
        action => "index"
        hosts => ["oratst:9200"]
        index => "logstash-datalake"
        document_id => "%{csis}%{cprc}%{camb}%{csrc}%{tver}"
        document_type => "DataLake"
      }
      stdout { codec => rubydebug }
    }

What happens is that each time the scheduler fire a synchronization with the table the data on the ES index are first deleted and then re-imported from the table.
This causes two problems:

  1. The process is too slow and logically not efficient, can you suggest alternatives?
  2. The @timestamp of each row is updated at each scheduler run while i would have an update only for the updated records! is it possible?

Thanks in advance

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.