I have a set of data files that get uploaded by clients periodically. Each file is a complete collection of items, some of which may have changed since the last upload. However, the data does not include any kind of last-update timestamp, so the only way to identify changes is to compare the data for a given item to what was in the previous file. In total, there are hundreds of thousands of items to be processed.
What I was hoping to do is to index this data, using a consistent id for each item, and if the elasticsearch operation is not a noop, to update a timestamp field on the document. I could then query elasticsearch to find all the items that were truly updated (some of the data changed) since yesterday (or whenever).
Is there any way to achieve this?