Hello,
I'm facing a challenge and need your expertise. In our system, we have a real-time process that adds or removes documents in an Elasticsearch index based on changes in an RDBMS. Alongside, we also have a batch process that periodically refreshes the entire index data.
Consider this scenario:
- At time
T1
, a real-time process detects a record deletion in the RDBMS and subsequently deletes the corresponding document from Elasticsearch. - At time
T2
, our batch process, unaware of the recent change, brings in an older snapshot of data that still includes the previously deleted record, thus potentially "resurrecting" the document in Elasticsearch.
I hope this example clarifies the dilemma.
Are there any built-in mechanisms or best practices within Elasticsearch to handle such situations? Or would I be required to implement safeguards at the application level to ensure data integrity between the two processes?
Thank you in advance for your insights and suggestions!