Update docs only when fields change and store time and content of update


I'm importing data periodically. Some of the documents are completely new, whereas others include modifications to only certain fields.

Using the bulk api, I'd like to be able to index the new documents, storing a created time. And, update the modified documents, only when fields have changed, storing at least an updated time and ideally, also, a record of what changed (so ultimately I could view revisions over time).

Is this possible? What would be the recommended way to do this?

(Nik Everett) #2

It's common. Usually you use a groovy script for the update. You can detect
if there are changes, set the operation to noop if there aren't and set the
last modified time if there are changes. You have to write the groovy for
it. See the documentation for scripted updates.

The wikimedia-extra plugin supports some of what you are after in a
declarative way with its super_detect_noop feature. Not the last modified
time though.


(system) #3