Hi.
I'm quite new to Elasticsearch. I'm using the python client (v8.12.0).
I'd like to add to my index the timestamp fields created_at and updated_at for every document.
Reading various docs I think I have to use IngestClient, in a quite convolute way... To start with, I do not even understand how should I install it (using pip?)
Can anybody guide me to add 2 simple timestamp fields created_at and updated_at fields (which are supposed to be automatically filled by es on document creation/update)... ?
The timestamp will automatically be added through the pipeline you set. So when you search through your documents you will see that that field has been filled:
query={
"match": {
"foo": "bar"
}
}
response = es.search(index=index_name, query=query)
for hit in response["hits"]["hits"]:
print(hit['_source'])
Hi Iulia!
Thank you so much! Your suggestion is perfectly clear to me for created_at timestamp.
But does it work for updated_at timestamps too? Because I only see "field": "created_at" in the pipeline processors...
I almost always use upsert logic to insert/update documents, so I can't add created_at/updated_at timestamps on client-side, I suppose...
I believe the example will update the timestamp on both update and creation, so would possibly be better renamed to updated_at. In order to create a created_at field you need a separate processor that has a condition to only run if the created_at field does not already exist.
The ingest.put_pipeline command works, and update_at field is set (on every upsert), but created_at field is never set, even if it is specified in the mappings...
Sorry, I do not know how to add an if clause to check if the field exists... I don't know where to add it, which are the conventions to address fields, nor even the language I should use to make the test... Is it Python? or Painless (Java, I suppose)?
And however, both the fields do exist in the mappings...
One more problem now... :-/ created_at field keeps updating on every upsert, even with "override": False
Also updated_at is set also on the first insertion, even with "if" : "ctx?.created_at != null",, but this is not a problem for me...
This is my update statement, it it can help...
response = self._es.update(
index = indexName,
id = id,
doc = doc,
doc_as_upsert = True,
)
Sorry I had it the wrong way around - you want the condition to be in the created_at field - to only edit that value a single time (which is when you first initiate it, whereas before it was null).
And the updated_at field will update every single time you make a change (including when you create the index so indeed you will always have both fields filled in).
This works for me with the created_at not changing while updated_at does:
Unfortunately I keep getting always updated_at always set as created_at.
I didn't know Using ingest pipelines with doc_as_upsert is not supported.
But, how can I avoid doc_as_upsert, If I have to insert a document if it is new, and update it if it is already present?
(however, the really important field for me is updated_at, I can live without a created_at... )
Okay, I checked on my side with doc_as_upsert = True and it still works with updating the updated_at field; while created_at stays the same.
Having updated_at in the beginning it the expected behavior - since when you create the index that is also considered an update.
Not sure I understand what's not working on your side - with the code I posted last you would get a new value for updated_at every time you run your update command.
Can you make sure you copied the latest version? The order of created & updated changed to put the if statement in the correct part so maybe you missed that?
The created_at field should not update at any other point other than the very first index operation because that is the only time the field is null. So as long as you have that if statement set in the mapping like this:
At last I understood my mistake: I did non change anything in my documents among upserts!
As soon as I did add a random string to a field, everyting now works as expected!
Thanks for your time, for your explanations, and for your kindness!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.