I'm using the following pipeline to add creation timestamp to docs but it changes the timestamp every time that I put a newer version of a same doc to elastic.
I resolve the issue by adding op_type as a query parameter when indexing docs using PUT or POST (see docs). This way the elastic returns error code 409 and I catch this error when indexing documents and just pass it!
if you are using the index API again, then this will be treated like a new index operation. If you want to update a document, take a look at the update API.
Hi, thanks @spinscale.
I think there is a missunderstanding.
I know the difference of POST and PUT.
Even if I use:
PUT /test/_doc/1?pipeline=set_creation_date
{"name": "a"}
the end result is the same, the creation time updates every time that I update the doc.
My specific use case is that I am clawing news from some RSS feeds, hash the news body (main text) and use this hash to index news articles in elastic. The problem is that crawler runs in a cronjob every 1 hour and there may be duplicate articles with different ids in the elastic, but using the method for id generation (hash the text of article) there is (approximately) no duplicate article in the elastic but creation time changes each time that I PUT a previously added article to elastic.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.