Incremental data on JDBC

I'm getting input from the JDBC plugin on logstash like this:

SELECT epoch,action_id,player_id,world_id,block_id FROM prism_data

The data looks like this:

{
         "epoch" => 1473046041,
     "action_id" => 27,
     "player_id" => 3,
      "world_id" => 3,
      "block_id" => 0,
      "@version" => "1",
    "@timestamp" => "2016-09-05T03:27:21.000Z"
}

What I want to do is find a way to check if the data has already been stored on my elasticsearch and if so then don't store it so I don't have duplicate data.

One of my ideas was using the epoch field instead of timestamp so I can get the time of the event instead of when the data was added.

Another idea was to truncate the table after getting the info from it so we ensure that all data stored will not be duplicated but I don't think that's the best way possible.

edit: The whole point of this is to get incremental data from a table so I have data synchronized with my elastic.

Solved it myself like this:

statement => "SELECT epoch,action_id,player_id,world_id,block_id FROM prism_data WHERE epoch >:sql_last_value"
use_column_value => true
tracking_column => epoch

"sql_last_value" stores the last value from the previous run so it will only add new data if the epoch is greater than last run. This prevents duplicates and guarantees incremental data.

5 Likes