A few years ago, in JDBC river, I used checksums (CRC) to detect modified data. While this works "somehow" you should be aware this imposes some restrictions on the data set behavior:
-
is your data set allow to grow, or to be modified, or to shrink? All three cases must be recognized followed by an appropriate action.
-
your SQL database has a different organization (relational data model). This means, a select operation creates rows that have "ephemeral" identity, they can not be identified later (unless you rely on primary keys only). If you repeat the operation, you may get different order of rows (if no 'order by' clause is given) or count of rows. So what builds your documents must be specified somewhere aside of the SQL statement. This is the relational/object impedance problem you have to solve in your application.
-
assuming you don't use a simple 1:1 mapping where one row builds one doc, if you merge several rows into one Elasticsearch document, you still have to detect what a modified row set in the database means to the Elasticsearch document - will it shrink, expand, or even get deleted?