I have an Oracle database with 80 million records. These have been migrated to elasticsearch and there is a continuous sync running that will keep the data updated in elasticsearch whenever data is added/changed in the Oracle database. I have two questions regarding this.
How to verify if all 80 million records have been indexed in ES? This might be fairly easy running count
How to verify the data consistency on a periodic basis, say each week to ensure that both data sources, Oracle and ES are in sync?
It depends. If you have for example the same id in Oracle and elasticsearch and also a timestamp for example, you can run a count per year in your database and an aggregation in elasticsearch and compare the number of documents per year. Then if you have a difference, you can compare months...
Then compare the ids of a month in the database with the ones in elasticsearch.
This is basically what I did in the past for a similar problem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.