Managing Real-time and Batch Processing in Elasticsearch to Prevent Document Resurrection

Hello,

I'm facing a challenge and need your expertise. In our system, we have a real-time process that adds or removes documents in an Elasticsearch index based on changes in an RDBMS. Alongside, we also have a batch process that periodically refreshes the entire index data.

Consider this scenario:

  • At time T1, a real-time process detects a record deletion in the RDBMS and subsequently deletes the corresponding document from Elasticsearch.
  • At time T2, our batch process, unaware of the recent change, brings in an older snapshot of data that still includes the previously deleted record, thus potentially "resurrecting" the document in Elasticsearch.

I hope this example clarifies the dilemma.

Are there any built-in mechanisms or best practices within Elasticsearch to handle such situations? Or would I be required to implement safeguards at the application level to ensure data integrity between the two processes?

Thank you in advance for your insights and suggestions!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.