Add creation (not update) time of doc using ingest

I'm using the following pipeline to add creation timestamp to docs but it changes the timestamp every time that I put a newer version of a same doc to elastic.

PUT _ingest/pipeline/set_creation_date
{
    "description": "Set creation date",
    "processors": [
      {
        "script": {
          "source": "ctx.created_at = new Date();"
        }
      }
    ]
}

Is there a way to add creation timestamp only for the first time that we add doc, and ignore the subsequent updates?

I resolve the issue by adding op_type as a query parameter when indexing docs using PUT or POST (see docs). This way the elastic returns error code 409 and I catch this error when indexing documents and just pass it!

I am not a hundred percent sure that this solves your issues, as the whole document does not get updated in that case.

How about checking for the existence of ctx.created_at in the script like

if (ctx.created_at != null) {
  ctx.created_at = ...
}

wouldnt that work as well?

Thanks @spinscale. Yes the document dos not update and in my specific application it is a desirable behavior.

Very good idea, It should works, but I it tried it and I wonder why it is not to working!
I used the following pipeline as you mentioned:

PUT _ingest/pipeline/set_creation_date
{
    "description": "Set creation date",
    "processors": [
      {
        "script": {
          "source": "if (ctx.created_at != null) {ctx.created_at = new Date();}"
        }
      }
    ]
}

It doesn't add the create_at even in the first POST request!

I think this should be if (ctx.created_at == null)

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "foo": "bar"
      }
    }
  ],
  "pipeline": {
    "processors": [
      {
        "script": {
          "source": "if (ctx.created_at == null) {ctx.created_at = new Date();}"
        }
      }
    ]
  }
}

@spinscale
That does not work neither!
Actually I tried both in Kibana dev tools, and they didn't work!

please provide a fully reproducible simulate ingest pipeline call including the response. Thanks!

@spinscale
Sorry for late response.

I used the following command in Kibana dev tools environment to create an ingest to set creation date:

PUT _ingest/pipeline/set_creation_date
{
    "description": "Set creation date",
    "processors": [
      {
        "script": {
          "source": "if (ctx.created_at == null) {ctx.created_at = new Date();}"
        }
      }
    ]
}

Then I post a doc to a text index with id 1 as follows:

POST /test/_doc/1?pipeline=set_creation_date
{"name": "a"}

Then I get it to see the creation time:

GET /test/_doc/1

It shows the doc which has a creation time like:

{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "a",
    "created_at" : "2020-02-17T06:36:17.423Z"
  }
}

but when I repeat the two step above and and the same doc with id 1, the creation time changes!

{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "a",
    "created_at" : "2020-02-17T06:36:50.327Z"
  }
}

Hey,

if you are using the index API again, then this will be treated like a new index operation. If you want to update a document, take a look at the update API.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html

--Alex

Hi, thanks @spinscale.
I think there is a missunderstanding.
I know the difference of POST and PUT.
Even if I use:

PUT /test/_doc/1?pipeline=set_creation_date
{"name": "a"}

the end result is the same, the creation time updates every time that I update the doc.

My specific use case is that I am clawing news from some RSS feeds, hash the news body (main text) and use this hash to index news articles in elastic. The problem is that crawler runs in a cronjob every 1 hour and there may be duplicate articles with different ids in the elastic, but using the method for id generation (hash the text of article) there is (approximately) no duplicate article in the elastic but creation time changes each time that I PUT a previously added article to elastic.

I did not talk about the difference between POST and PUT, but about a completely different endpoint.

Oh, yes! that is another trick, I will try it, thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.