Here's a quick description of our project:
ElasticSearch analysis results must be "day-1" (not real time).
Data source : A db in SqlServer.
Extract, Transform than load the data into ElasticSearch index.
We performed a POC that has yielded interesting results but there we go past to look at things in production.
The problem is that the source data are changed (add, delete, update) but we have no way to calculate the delta between "day" and "day -1" in terms of data updates (no attributes in allowing bdd to do that).
You agree that
Cost of calculation of the delta between two extraction results ("day" and "day-1") +
Cost(loadind data in ES)
could be very consumer.
The remaining solution is to run from scratch every night, ie:
- Delete Index (all types, all documents)
- Extract data from scratch (from the SqlServer database)
- Rebuild the index again.
Having no hindsight, I would like to have your opinion on this strategy.
Is this the best solution? Tahnks