I'm trying to develop a database solution for some event data coming off of an embedded industrial system. Unfortunately, because of the way it is designed it generates a lot of duplicate events. Some are genuine exact duplicates and others are updates to previous events (i.e. filling in the end time of an event after it has finished).
Of course, I could use the Update API's upsert functionality to achieve this. However, I won't know the document ID of the event I'm trying to update when a new event arrives. From what I can tell (and admittedly I'm a complete Elasticsearch newbie so feel free to correct me!) you need to know the document ID of the document you're upserting to use the Update API.
In traditional SQL land I'd have to do a SELECT then UPDATE/INSERT as two independent operations. Do I have to follow the same programming model in Elasticsearch or is there a better, more efficient, way. I don't mind if the process has to happen asynchronously in the background with eventual consistancy. As long as the system doesn't grind to a halt (which is what's currently happening to the SQL database!).