Hello,
I'm facing a challenge and need your expertise. In our system, we have a real-time process that adds or removes documents in an Elasticsearch index based on changes in an RDBMS. Alongside, we also have a batch process that periodically refreshes the entire index data.
Consider this scenario:
- At time
T1, a real-time process detects a record deletion in the RDBMS and subsequently deletes the corresponding document from Elasticsearch. - At time
T2, our batch process, unaware of the recent change, brings in an older snapshot of data that still includes the previously deleted record, thus potentially "resurrecting" the document in Elasticsearch.
I hope this example clarifies the dilemma.
Are there any built-in mechanisms or best practices within Elasticsearch to handle such situations? Or would I be required to implement safeguards at the application level to ensure data integrity between the two processes?
Thank you in advance for your insights and suggestions!