Does the logstash_checksum column exist in the database? If so, remove the {}. If not, you cannot reference it in the input. I would expect it to get added by the checksum filter.
There is no reason to use md5. It is broken and has been for nearly two decades. Continuing to use it to detect accidental duplication just encourages folks to continue to use it for cryptographic purposes for which it is completely unfit. Why not use the default algorithm?
the logstash_checksum does not exist in my table. I just want to have a way where i can import my sql data to elastic and that I can check whether there is an update on a row or not, so that not all my data has to be pushed to elastic search everytime. Are there examples on how to manage that ?
As for the algorithm, i was just trying something.
If there is no tracking column in the database then you are going to have to fetch all of the data repeatedly and detect duplicates. You could do that with a checksum filter (or in any version of logstash from the last few years, a fingerprint filter) and then set the document_id on the elasticsearch output to the checksum.
Isn't there a huge load on your db doing it this way ? what if my tables have millions of rows and they all have to be fetched just to check whether there is a change or not ?
Yes. But if there is no tracking column in the database there is no alternative. If you have a column that is a timestamp or sequence that tracks updates then use it, and you will only fetch updated rows.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.