How to re-verify data consistency with external RDBMS source

ilyasjaan · July 12, 2017, 11:23am

Hi,

We have got millions (about 50) of data objects stored in normal RDBMS. And we are trying to index these records as documents in ES.

All OK, we got job to do that and it looks doable.

One question we have is: How can we compare the data with transactional data that what we have in ES is same as we do in RDBMS?

We do have updates to data equal to amount of searches... this is why it becomes more as needed to see are we consistent to RDBMS data store at any specific time????

Any tool available that can help to verify data in indexes and compare the data with RDMS etc?

How would you address this situation if this is asked by customer?

Many Thanks
M. Ilyas

dadoonet · July 12, 2017, 11:38am

First, elasticsearch is not transactional.

You have multiple options:

Use a transactional message queue system in the between. It will make sure that messages are delivered to elasticsearch
Do auditing: run every night a batch which counts documents in the RDBMS and in elasticsearch. If anything is wrong, reindex what is missing.
Add a "try-catch" block around elasticsearch index call and anytime something is wrong just log it or send it to dead letter queue (again message queue system)

Better: combine all that... That's what I've doing in the past.

ilyasjaan · July 13, 2017, 12:52pm

Hmmm...

Seems a good option, but how efficiently we can check "what is missing?".

Are you proposing to check all primary keys in RDBMS and find documents in ES for same keys and see what is missing?

This can take long isn't it?

Secondly missing updates will still be a question even document exists in ES? Are we saying trust on messaging log that if message has been sent should have processed????

dadoonet · July 13, 2017, 1:05pm

Was just doing a count per month on both sides.

ilyasjaan · July 13, 2017, 1:06pm

Using aggregation (faceting) I believe??? (At ES end?)

system · August 10, 2017, 1:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Comparing data from a RDBMS to Elasticsearch Elasticsearch	2	1225	September 22, 2017
Verifying data consistency between Oracle and ES Elasticsearch	2	443	July 30, 2018
[Proposal] Index Verification Elasticsearch	5	343	July 6, 2017
Real time and sync Elasticsearch	1	711	July 17, 2020
Syncing Elasticsearch with RDBMS Elasticsearch	3	395	October 16, 2018

How to re-verify data consistency with external RDBMS source

Related topics