What would be the most performant way to maintain a creation timestamp and last update timestamp for each doc?
I am indexing documents which have a set ID. When there is a new ID the creation timestamp is set. When fields other than the ID is changed in a given document, the last update timestamp is set.
I'm assuming this would need to be done in a script using the update api. So, do I need to manually compare all fields in the old and new documents, ignoring the timestamp fields, and set operation to noop if they are the same and update the document and last update timestamp and persist the creation timestamp, if the fields are different?
I see there is a detect noop feature in the update api, can that be configured to ignore specific fields (like the timestamps)? Can its value be accessed in the script so we can perform operations if no change is detected?
You could try to create an ingest pipeline for setting the created and updated timestamps, in a pipeline you can both add new document fields and modify old ones. Ref:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/put-pipeline-api.html
You can create fairly complex scripts using the painless scripting language inside such pipelines. Here's an example of a pipeline we use at my company for setting the processing time on our documents:
{
"description" : "Sets the document processingtime",
"processors" : [
{
"script" : {
"lang" : "painless",
"inline" : "DateFormat df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");\ndf.setTimeZone(TimeZone.getTimeZone("UTC"));\nDate date = new Date();\nctx.processingtime = df.format(date);"
}
}
]
}
It's fairly efficient too as we run millions of docs through this pipeline every day without any noticeable delays in our indexing.
Good luck!