How to add created_at and updated_at fields

Hi.
I'm quite new to Elasticsearch. I'm using the python client (v8.12.0).
I'd like to add to my index the timestamp fields created_at and updated_at for every document.
Reading various docs I think I have to use IngestClient, in a quite convolute way... To start with, I do not even understand how should I install it (using pip?)

Can anybody guide me to add 2 simple timestamp fields created_at and updated_at fields (which are supposed to be automatically filled by es on document creation/update)... ?

Hi!

You can use ingest pipelines to set custom rules for what sort of fields should be added to your index; and there are default functions to collect timestamp information.

I just tested out this example for you:

index_name = "test_timestamp"

mappings = {
    "properties" : {
        "foo" : {
            "type" : "keyword",
            "type" : "text"
        },
        "created_at": {
            "type": "date" 
        },
        "updated_at": {
            "type": "date" 
        }
    }
}

settings = {
    "index.default_pipeline" : "ingest_with_dates"
}

es.ingest.put_pipeline(
    id="ingest_with_dates", 
    processors=[
    {
        "set": {
            "field": "created_at",
            "value": "{{_ingest.timestamp}}"
        }
    }]
)

es.indices.create(index=index_name, mappings=mappings, settings=settings)

The main things here are:

  • setting the index mapping that you expect the date field;
  • using an ingest pipeline that sets that timestamp to the field you expect everytime a document gets added;
  • and making this pipeline the default way to add documents to the index (through the settings)

Then if you simply add a document like this (with just the fields you want to add):

es.index(
    index=index_name,
    id=0,
    document={
        "foo": "bar",
    },
)

The timestamp will automatically be added through the pipeline you set. So when you search through your documents you will see that that field has been filled:

query={
    "match": {
        "foo": "bar"
    }
}

response = es.search(index=index_name, query=query)
for hit in response["hits"]["hits"]:
    print(hit['_source'])

{'created_at': '2024-03-12T12:50:02.626995027Z', 'foo': 'bar'}

I've just added the full example to a github page just in case.

2 Likes

Hi Iulia!
Thank you so much! Your suggestion is perfectly clear to me for created_at timestamp.
But does it work for updated_at timestamps too? Because I only see "field": "created_at" in the pipeline processors...
I almost always use upsert logic to insert/update documents, so I can't add created_at/updated_at timestamps on client-side, I suppose...

I believe the example will update the timestamp on both update and creation, so would possibly be better renamed to updated_at. In order to create a created_at field you need a separate processor that has a condition to only run if the created_at field does not already exist.

1 Like

Perfect, thanks!
Can you please make an example of writing a processor with a condition? Should I use a script? (sorry, I'm really new on ES... :-/)

Have a look at the examples in the docs.

I ended up with this code:

        self._es.ingest.put_pipeline(
          id = "ingest_with_timestamps", 
          processors =  [
            {
              "set": {
                "field": "created_at",
                "value": "{{_ingest.timestamp}}",
                "override": false
              }
            },
            {
              "set": {
                "field": "updated_at",
                "value": "{{_ingest.timestamp}}",
                "override": true
              }
            }
          ]
        )

Could'nt test it yet, I'll do it ASAP...
Thanks, everybody!

3 Likes

Could you perhaps add an if clause to check if the field exists?

The ingest.put_pipeline command works, and update_at field is set (on every upsert), but created_at field is never set, even if it is specified in the mappings...

Sorry, I do not know how to add an if clause to check if the field exists... I don't know where to add it, which are the conventions to address fields, nor even the language I should use to make the test... Is it Python? or Painless (Java, I suppose)?
And however, both the fields do exist in the mappings...

I think this is what you're looking for:

es.ingest.put_pipeline(
    id="ingest_with_dates", 
    processors=[
    {
        "set": {
            "field": "created_at",
            "value": "{{_ingest.timestamp}}",
            "override": False
        }
    },{
        "set": {
            "if" : "ctx?.created_at != null",
            "field": "updated_at",
            "value": "{{_ingest.timestamp}}"
        }
    }]
)
  • Set the False with a capital F to make it a boolean value
  • the default override is true so no need to add that
  • the painless if statement

Then if you run an

es.update(index=index_name, id = 0, body={"doc":{"foo" : "baree"}})

Only the updated_at field will change:

{'updated_at': '2024-03-12T15:46:23.628032096Z', 'created_at': '2024-03-12T15:46:05.431077621Z', 'foo': 'baree'}

2 Likes

Thanks Iulia...

Sorry, my mistake... I don't get the updated_at field even...
I was seeing it because I did add it in my update statement... :frowning:

And, I do have both of them in my mappings...

      "created_at": {
        "type": "date"
      },
      "updated_at": {
        "type": "date"
      }

But (hurrah!) now I have both fields set!!! :tada: :tada: :tada:

I had to change

settings = {
    "index.default_pipeline" : "ingest_with_dates"
}

to

settings = {
    "default_pipeline" : "ingest_with_dates"
}

One more problem now... :-/
created_at field keeps updating on every upsert, even with "override": False
Also updated_at is set also on the first insertion, even with "if" : "ctx?.created_at != null",, but this is not a problem for me...

This is my update statement, it it can help...

      response = self._es.update(
        index = indexName,
        id = id,
        doc = doc,
        doc_as_upsert = True,
      )

Hey,

Sorry I had it the wrong way around - you want the condition to be in the created_at field - to only edit that value a single time (which is when you first initiate it, whereas before it was null).

And the updated_at field will update every single time you make a change (including when you create the index so indeed you will always have both fields filled in).

This works for me with the created_at not changing while updated_at does:

index_name = "test_timestamp"

mappings = {
    "properties" : {
        "foo" : {
            "type" : "keyword",
            "type" : "text"
        },
        "created_at": {
            "type": "date" 
        },
        "updated_at" : {
            "type" : "date"
        }
    }
}

settings = {
    "index.default_pipeline" : "ingest_with_dates"
}

es.ingest.put_pipeline(
    id="ingest_with_dates", 
    processors=[
    {
        "set": {
            "field": "updated_at",
            "value": "{{_ingest.timestamp}}"
        }
    },{
        "set": {
            "if" : "ctx?.created_at == null",
            "field": "created_at",
            "value": "{{_ingest.timestamp}}"
        }
    }]
)


es.indices.create(index=index_name, mappings=mappings, settings=settings)

It might be your use of upsert; looking at the docs here it says at the very bottom that:

Using ingest pipelines with doc_as_upsert is not supported.

So that may be interfering with the pipeline runs as you defined them. Can you try the code without doc_as_upsert to see if we get consistent results?

I updated the full code on the repo so you see it start to finish.

1 Like

Thanks!

Unfortunately I keep getting always updated_at always set as created_at.

I didn't know Using ingest pipelines with doc_as_upsert is not supported.
But, how can I avoid doc_as_upsert, If I have to insert a document if it is new, and update it if it is already present?

(however, the really important field for me is updated_at, I can live without a created_at... :slight_smile: )

Okay, I checked on my side with doc_as_upsert = True and it still works with updating the updated_at field; while created_at stays the same.

Having updated_at in the beginning it the expected behavior - since when you create the index that is also considered an update.

Not sure I understand what's not working on your side - with the code I posted last you would get a new value for updated_at every time you run your update command.
Can you make sure you copied the latest version? The order of created & updated changed to put the if statement in the correct part so maybe you missed that?

1 Like

On my side, the only issue is the created_at field gets updated un every update, which should not...

In the latest version of your code I do not see doc_as_upsert...

Thanks a lot for your support!

This is what I used to update the docs:

es.update(index=index_name, id = 0, doc = {"foo" : "bar_test6"}, doc_as_upsert=True)

I tested by changing the foo value a bunch of times and searching to see how the dates changed on that document

query={
    "match": {
        "_id": 0
    }
}
response = es.search(index=index_name, query=query)
for hit in response["hits"]["hits"]:
    print(hit['_source'])

So when I first create it, the result is:

{'updated_at': '2024-03-12T17:54:53.528152653Z', 'created_at': '2024-03-12T17:54:53.528152653Z', 'foo': 'bar'}

Then after a few updates I get to:

{'updated_at': '2024-03-12T17:55:30.138842685Z', 'created_at': '2024-03-12T17:54:53.528152653Z', 'foo': 'bar_test8'}

The created_at field should not update at any other point other than the very first index operation because that is the only time the field is null. So as long as you have that if statement set in the mapping like this:

"set": {
            "if" : "ctx?.created_at == null",
            "field": "created_at",
            "value": "{{_ingest.timestamp}}"
        }

It shouldn't change anymore.

Thanks Iulia!

At last I understood my mistake: I did non change anything in my documents among upserts!
As soon as I did add a random string to a field, everyting now works as expected!

Thanks for your time, for your explanations, and for your kindness!

ah, awesome! Glad it worked in the end! Happy to help!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.