Hello,
Our app uses Spark job to read from Kinesis, and write to Elastic.
We got a situation in which the job failed to write to Elastic, but kept reading from Kinesis, which made us missing data.
Trying to figure out a solution, we thought about using the kinesis checkpoint, which is unique, as the elastic document _id.
This will mean, that on any failure, we could just rollback to know checkpoint, and restart the job, which will just overwrite existing documents (if happens).
What do you think ?
Is using custom id is a proper solution ?
What about performance of Elastic in this case ?
Thanks,
Shushu