Handling duplicates from SQL Dumps/json and API response in Logstash

wheelq · October 28, 2018, 6:31pm

I am just about to start importing SQL Dumps and API responses from one of the systems. But I have realised that those data dumps and API responses will contain same information all the time, but also:

some of the data might get updated (like tables with user details and last_login_time)
some of the data might get removed (user has removed their account)
some of the data might be added (new users added).

How do I handle this in ES? sincedb_path is not helpful at all , this is useful for streaming data only. Even if it once detected that SQL Dump had only one record, filters failed because logstash tried to use filters on the new data only. Why? Because dump is in JSON format, and filters run 'split' module first which obviously doesn't work with just that tiny piece of data which has changed.

Any ideas?

warkolm · October 29, 2018, 3:48am

Find some unique, but static values to stitch together to form a _id and then use that for the doc. Then if an update occurs it will just update (overwrite) the existing document.

balumurari1 · October 29, 2018, 6:55am

use scheduler, so that it will update the data for every 1 minute as shown below,

The input code is fine, in the output did you mention elasticsearch to get the result,

Try this,

input {

jdbc {
jdbc_driver_library => "xxxx\oracle-10g\ojdbc14.jar"
jdbc_driver_class => "oracle.jdbc.driver.OracleDriver"
jdbc_connection_string => "jdbc:oracle:thin:@localhost:1521:DATABASE"
jdbc_user => "ROMAINROM"
jdbc_password => "ROMAINROM"
statement => "SELECT TOP 10 * FROM TABLE"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
schedule => "*/1 * * * *"
}

}

output{
elasticsearch { codec => json hosts => ["localhost:9200"] index => "index9" }
stdout { codec => rubydebug }
}

wheelq · October 31, 2018, 1:51pm

I am sorry....WHAT?

system · November 28, 2018, 1:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting duplicate records with logstash JDBC plugin Logstash	4	3385	February 23, 2017
Logstash and databse Logstash	16	5637	July 6, 2017
Db streaming from last input Logstash	3	603	July 6, 2017
Logstash adding duplicate rows Logstash	2	845	June 13, 2017
Updating data which are newly added Logstash	3	2101	April 25, 2017

Handling duplicates from SQL Dumps/json and API response in Logstash

Related topics