Use logstash_checksum as tracking_column

WalterLeff · November 25, 2019, 9:49am

Hi all

Logstash creates perfectly a checksum from my row data

{
  "_index": "rel",
  "_type": "_doc",
  "_id": "2",
  "_version": 5,
  "_score": 0,
  "_source": {
   
    "datecreated": "2016-06-09T00:00:00.000Z",
    "datemodified": "2019-11-20T00:00:00.000Z",
    "modifiedby": null,
    "@timestamp": "2019-11-25T09:04:02.730Z",
    "rel_id": 4,
    "ts_id": null,
     "md_id": null,
    "createdby": "SYSTEM",
    "ng": false,
    "logstash_checksum": "5694ec39969161d4b74d716001f81c0b",
    "id": 2,
    "isactive": true,
    "@version": "1"
  },

But i get an error when i'm trying to get this checksum to work as tracking column for updated rows

[main] Exception when executing JDBC query {:exception=>#<Sequel::DatabaseError: Java::ComMicrosoftSqlserverJdbc::SQLServerException: Incorrect syntax near '}'.>}

this is my conf file

input {  
jdbc {
           
    # Query for testing purpose
    statement => "SELECT * from gln where {logstash_checksum} > :sql_last_value"
    tracking_column => "{logstash_checksum}"
    #tracking_column_type => "binary"
    
    use_column_value => true   
    #clean_run => true        
    schedule => "* * * * *"
}
}
filter{
    checksum {
   algorithm => 'md5' 
  }
    # fingerprint{}
}
output {  
    elasticsearch {
        hosts => ["localhost:9200"]
        index =>  "rel"
        document_id => "%{id}"    
    }
stdout { codec => rubydebug }
}

any idea what i'm doing wrong ?

thx in advance

Badger · November 25, 2019, 3:05pm

Does the logstash_checksum column exist in the database? If so, remove the {}. If not, you cannot reference it in the input. I would expect it to get added by the checksum filter.

There is no reason to use md5. It is broken and has been for nearly two decades. Continuing to use it to detect accidental duplication just encourages folks to continue to use it for cryptographic purposes for which it is completely unfit. Why not use the default algorithm?

WalterLeff · November 25, 2019, 3:10pm

Hi Badger

the logstash_checksum does not exist in my table. I just want to have a way where i can import my sql data to elastic and that I can check whether there is an update on a row or not, so that not all my data has to be pushed to elastic search everytime. Are there examples on how to manage that ?

As for the algorithm, i was just trying something.

thx for the response

Badger · November 25, 2019, 3:18pm

If there is no tracking column in the database then you are going to have to fetch all of the data repeatedly and detect duplicates. You could do that with a checksum filter (or in any version of logstash from the last few years, a fingerprint filter) and then set the document_id on the elasticsearch output to the checksum.

WalterLeff · November 25, 2019, 4:25pm

Isn't there a huge load on your db doing it this way ? what if my tables have millions of rows and they all have to be fetched just to check whether there is a change or not ?

or am i missing something ?

Badger · November 25, 2019, 4:33pm

Yes. But if there is no tracking column in the database there is no alternative. If you have a column that is a timestamp or sequence that tracks updates then use it, and you will only fetch updated rows.

WalterLeff · November 25, 2019, 4:40pm

I have a column in my table that is a timestamp in binary, but logstash imports it like this

<Sequel::SQL::Blob:0x9b18 bytes=8 content="\u0000\u0000\u0000\u0000\u0001\xE6\u0016\xEF"
instead of
00000000000000000001E60016xEF"

any idea why this happens ?

Badger · November 25, 2019, 4:48pm

No. You should create a new question about that.

WalterLeff · November 25, 2019, 4:51pm

I did

Anyway thanks for the patience and help

system · December 23, 2019, 4:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Updated data with tracking column for sql server binary timestamp Logstash	1	411	December 18, 2019
Continuous Data from SQL Server into Logstash Logstash	15	13990	January 10, 2019
JDBC tracking a string column for unique ingestion Logstash	9	4550	December 20, 2017
Tracking column failed Logstash	2	1128	September 1, 2017
Logstash jdbc input checkpoint Logstash	8	3911	March 26, 2019

Use logstash_checksum as tracking_column

Related topics