How to add created_at and updated_at fields

Marco_Solari · March 11, 2024, 6:36pm

Hi.
I'm quite new to Elasticsearch. I'm using the python client (v8.12.0).
I'd like to add to my index the timestamp fields created_at and updated_at for every document.
Reading various docs I think I have to use IngestClient, in a quite convolute way... To start with, I do not even understand how should I install it (using pip?)

Can anybody guide me to add 2 simple timestamp fields created_at and updated_at fields (which are supposed to be automatically filled by es on document creation/update)... ?

iulia · March 12, 2024, 12:57pm

Hi!

You can use ingest pipelines to set custom rules for what sort of fields should be added to your index; and there are default functions to collect timestamp information.

I just tested out this example for you:

index_name = "test_timestamp"

mappings = {
    "properties" : {
        "foo" : {
            "type" : "keyword",
            "type" : "text"
        },
        "created_at": {
            "type": "date" 
        },
        "updated_at": {
            "type": "date" 
        }
    }
}

settings = {
    "index.default_pipeline" : "ingest_with_dates"
}

es.ingest.put_pipeline(
    id="ingest_with_dates", 
    processors=[
    {
        "set": {
            "field": "created_at",
            "value": "{{_ingest.timestamp}}"
        }
    }]
)

es.indices.create(index=index_name, mappings=mappings, settings=settings)

The main things here are:

setting the index mapping that you expect the date field;
using an ingest pipeline that sets that timestamp to the field you expect everytime a document gets added;
and making this pipeline the default way to add documents to the index (through the settings)

Then if you simply add a document like this (with just the fields you want to add):

es.index(
    index=index_name,
    id=0,
    document={
        "foo": "bar",
    },
)

The timestamp will automatically be added through the pipeline you set. So when you search through your documents you will see that that field has been filled:

query={
    "match": {
        "foo": "bar"
    }
}

response = es.search(index=index_name, query=query)
for hit in response["hits"]["hits"]:
    print(hit['_source'])

{'created_at': '2024-03-12T12:50:02.626995027Z', 'foo': 'bar'}

I've just added the full example to a github page just in case.

Marco_Solari · March 12, 2024, 2:13pm

Hi Iulia!
Thank you so much! Your suggestion is perfectly clear to me for created_at timestamp.
But does it work for updated_at timestamps too? Because I only see "field": "created_at" in the pipeline processors...
I almost always use upsert logic to insert/update documents, so I can't add created_at/updated_at timestamps on client-side, I suppose...

Christian_Dahlqvist · March 12, 2024, 2:18pm

I believe the example will update the timestamp on both update and creation, so would possibly be better renamed to updated_at. In order to create a created_at field you need a separate processor that has a condition to only run if the created_at field does not already exist.

Marco_Solari · March 12, 2024, 2:21pm

Perfect, thanks!
Can you please make an example of writing a processor with a condition? Should I use a script? (sorry, I'm really new on ES... :-/)

Christian_Dahlqvist · March 12, 2024, 2:24pm

Have a look at the examples in the docs.

Marco_Solari · March 12, 2024, 2:51pm

I ended up with this code:

        self._es.ingest.put_pipeline(
          id = "ingest_with_timestamps", 
          processors =  [
            {
              "set": {
                "field": "created_at",
                "value": "{{_ingest.timestamp}}",
                "override": false
              }
            },
            {
              "set": {
                "field": "updated_at",
                "value": "{{_ingest.timestamp}}",
                "override": true
              }
            }
          ]
        )

Could'nt test it yet, I'll do it ASAP...
Thanks, everybody!

Christian_Dahlqvist · March 12, 2024, 3:20pm

Could you perhaps add an if clause to check if the field exists?

Marco_Solari · March 12, 2024, 3:22pm

The ingest.put_pipeline command works, and update_at field is set (on every upsert), but created_at field is never set, even if it is specified in the mappings...

Marco_Solari · March 12, 2024, 3:27pm

Sorry, I do not know how to add an if clause to check if the field exists... I don't know where to add it, which are the conventions to address fields, nor even the language I should use to make the test... Is it Python? or Painless (Java, I suppose)?
And however, both the fields do exist in the mappings...

iulia · March 12, 2024, 3:48pm

I think this is what you're looking for:

es.ingest.put_pipeline(
    id="ingest_with_dates", 
    processors=[
    {
        "set": {
            "field": "created_at",
            "value": "{{_ingest.timestamp}}",
            "override": False
        }
    },{
        "set": {
            "if" : "ctx?.created_at != null",
            "field": "updated_at",
            "value": "{{_ingest.timestamp}}"
        }
    }]
)

Set the False with a capital F to make it a boolean value
the default override is true so no need to add that
the painless if statement

Then if you run an

es.update(index=index_name, id = 0, body={"doc":{"foo" : "baree"}})

Only the updated_at field will change:

{'updated_at': '2024-03-12T15:46:23.628032096Z', 'created_at': '2024-03-12T15:46:05.431077621Z', 'foo': 'baree'}

Marco_Solari · March 12, 2024, 4:30pm

Thanks Iulia...

Sorry, my mistake... I don't get the updated_at field even...
I was seeing it because I did add it in my update statement...

And, I do have both of them in my mappings...

      "created_at": {
        "type": "date"
      },
      "updated_at": {
        "type": "date"
      }

But (hurrah!) now I have both fields set!!!

I had to change

settings = {
    "index.default_pipeline" : "ingest_with_dates"
}

to

settings = {
    "default_pipeline" : "ingest_with_dates"
}

One more problem now... :-/
created_at field keeps updating on every upsert, even with "override": False
Also updated_at is set also on the first insertion, even with "if" : "ctx?.created_at != null",, but this is not a problem for me...

This is my update statement, it it can help...

      response = self._es.update(
        index = indexName,
        id = id,
        doc = doc,
        doc_as_upsert = True,
      )

iulia · March 12, 2024, 4:45pm

Hey,

Sorry I had it the wrong way around - you want the condition to be in the created_at field - to only edit that value a single time (which is when you first initiate it, whereas before it was null).

And the updated_at field will update every single time you make a change (including when you create the index so indeed you will always have both fields filled in).

This works for me with the created_at not changing while updated_at does:

index_name = "test_timestamp"

mappings = {
    "properties" : {
        "foo" : {
            "type" : "keyword",
            "type" : "text"
        },
        "created_at": {
            "type": "date" 
        },
        "updated_at" : {
            "type" : "date"
        }
    }
}

settings = {
    "index.default_pipeline" : "ingest_with_dates"
}

es.ingest.put_pipeline(
    id="ingest_with_dates", 
    processors=[
    {
        "set": {
            "field": "updated_at",
            "value": "{{_ingest.timestamp}}"
        }
    },{
        "set": {
            "if" : "ctx?.created_at == null",
            "field": "created_at",
            "value": "{{_ingest.timestamp}}"
        }
    }]
)


es.indices.create(index=index_name, mappings=mappings, settings=settings)

It might be your use of upsert; looking at the docs here it says at the very bottom that:

Using ingest pipelines with doc_as_upsert is not supported.

So that may be interfering with the pipeline runs as you defined them. Can you try the code without doc_as_upsert to see if we get consistent results?

I updated the full code on the repo so you see it start to finish.

Marco_Solari · March 12, 2024, 5:08pm

Thanks!

Unfortunately I keep getting always updated_at always set as created_at.

I didn't know Using ingest pipelines with doc_as_upsert is not supported.
But, how can I avoid doc_as_upsert, If I have to insert a document if it is new, and update it if it is already present?

(however, the really important field for me is updated_at, I can live without a created_at... )

iulia · March 12, 2024, 5:32pm

Okay, I checked on my side with doc_as_upsert = True and it still works with updating the updated_at field; while created_at stays the same.

Having updated_at in the beginning it the expected behavior - since when you create the index that is also considered an update.

Not sure I understand what's not working on your side - with the code I posted last you would get a new value for updated_at every time you run your update command.
Can you make sure you copied the latest version? The order of created & updated changed to put the if statement in the correct part so maybe you missed that?

Marco_Solari · March 12, 2024, 5:42pm

On my side, the only issue is the created_at field gets updated un every update, which should not...

In the latest version of your code I do not see doc_as_upsert...

Thanks a lot for your support!

iulia · March 12, 2024, 5:58pm

This is what I used to update the docs:

es.update(index=index_name, id = 0, doc = {"foo" : "bar_test6"}, doc_as_upsert=True)

I tested by changing the foo value a bunch of times and searching to see how the dates changed on that document

query={
    "match": {
        "_id": 0
    }
}
response = es.search(index=index_name, query=query)
for hit in response["hits"]["hits"]:
    print(hit['_source'])

So when I first create it, the result is:

{'updated_at': '2024-03-12T17:54:53.528152653Z', 'created_at': '2024-03-12T17:54:53.528152653Z', 'foo': 'bar'}

Then after a few updates I get to:

{'updated_at': '2024-03-12T17:55:30.138842685Z', 'created_at': '2024-03-12T17:54:53.528152653Z', 'foo': 'bar_test8'}

The created_at field should not update at any other point other than the very first index operation because that is the only time the field is null. So as long as you have that if statement set in the mapping like this:

"set": {
            "if" : "ctx?.created_at == null",
            "field": "created_at",
            "value": "{{_ingest.timestamp}}"
        }

It shouldn't change anymore.

Marco_Solari · March 12, 2024, 7:43pm

Thanks Iulia!

At last I understood my mistake: I did non change anything in my documents among upserts!
As soon as I did add a random string to a field, everyting now works as expected!

Thanks for your time, for your explanations, and for your kindness!

iulia · March 12, 2024, 7:46pm

ah, awesome! Glad it worked in the end! Happy to help!

system · April 9, 2024, 7:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practice for integrating automated created_at & updated_at fields in Elasticsearch 8.4 Elasticsearch ingest-pipeline	6	1043	October 16, 2022
Add creation (not update) time of doc using ingest Elasticsearch	12	2147	March 16, 2020
How to add a Runtime Field for doc ingest time? Elasticsearch painless , runtime-fields	3	511	July 22, 2022
Dec 12th, 2018: [EN][Elasticsearch] Automatically adding a timestamp to documents Advent Calendar	1	8440	December 1, 2019
Using ingestion pipeline to add metadata information to documents Elasticsearch	2	1007	March 13, 2017

How to add created_at and updated_at fields

Related topics