How to re-verify data consistency with external RDBMS source

Hi,

We have got millions (about 50) of data objects stored in normal RDBMS. And we are trying to index these records as documents in ES.

All OK, we got job to do that and it looks doable.

One question we have is: How can we compare the data with transactional data that what we have in ES is same as we do in RDBMS?

We do have updates to data equal to amount of searches... this is why it becomes more as needed to see are we consistent to RDBMS data store at any specific time????

Any tool available that can help to verify data in indexes and compare the data with RDMS etc?

How would you address this situation if this is asked by customer?

Many Thanks
M. Ilyas

First, elasticsearch is not transactional.

You have multiple options:

  • Use a transactional message queue system in the between. It will make sure that messages are delivered to elasticsearch
  • Do auditing: run every night a batch which counts documents in the RDBMS and in elasticsearch. If anything is wrong, reindex what is missing.
  • Add a "try-catch" block around elasticsearch index call and anytime something is wrong just log it or send it to dead letter queue (again message queue system)

Better: combine all that... That's what I've doing in the past.

1 Like

Hmmm...

Seems a good option, but how efficiently we can check "what is missing?".

Are you proposing to check all primary keys in RDBMS and find documents in ES for same keys and see what is missing?

This can take long isn't it?

Secondly missing updates will still be a question even document exists in ES? Are we saying trust on messaging log that if message has been sent should have processed????

Was just doing a count per month on both sides.

Using aggregation (faceting) I believe??? (At ES end?)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.