How to set “_id” value in elasticsearch document as my custom document id

Can I do this outside in my python script before I send the data to elastic search?

Something like:

PUT test/_doc/1234
{
 "id": "1234"
}

@dadoonet - thank you. I actually want to set it to a field in my document.

example :

currDetails_container_id which is in my json document , I want to set that as _id so that it removed duplicates and just updates this id as each document with same ID comes

If you can't set it yourself on the client side, you can use an ingest pipeline like:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "set": {
          "field": "_id",
          "value": "{{currDetails_container_id}}"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "currDetails_container_id": "1234"
      }
    }
  ]
}

This gives:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "1234",
        "_source" : {
          "currDetails_container_id" : "1234"
        },
        "_ingest" : {
          "timestamp" : "2020-11-10T16:46:08.519079145Z"
        }
      }
    }
  ]
}

@dadoonet - thank you very much. I want this applied to all incoming documents , so I dont want to specify document in the post. I tried just
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"set": {
"field": "_id",
"value": "{{currDetails_container_id}}"
}
}
]
}
}

and this gave an error that [docs] required property is missing. I have real time streaming data in my scenario and want the pipeline to update as the data is coming in

can I do something like this?

PUT _ingest/pipeline/my_pipeline
{
"description": "updates _id with container id at the time of ingestion",
"processors": [
{
"set": {
"field": "_id",
"value": "{{currDetails_container_id}}"
}
}
]
}

Sure.

Read the documentation about ingest pipelines.
Then feel free to ask about what you don't understand. I'll be happy to help.

Hi @dadoonet,

Thank you very much for helping me with this. I am able to update the id with the container_id, however its not giving me the latest document, some documents are not getting updated. Do you know how I can ensure that the document is getting updated with the latest document? there is a timefield in my document and I can see that the last document being shown is an old document and it never got updated.

I have no idea of what you have and what you exactly done so I can not help.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

I am so sorry. My bad! I will be more clear on my requests next time. The solution you recommended worked perfectly for me. Thank you!

For others who are having the same issue:
This is the query I ran. With this, my _id is being replaced by container_id thus removing all duplicates and updating with the latest container state

PUT my_index_2020-11-11
{
  "settings": {
    "default_pipeline": "dailyindex"
  }
}

PUT _ingest/pipeline/dailyindex
{
  "description": "updates _id with container id at the time of ingestion",
  "processors": [
    {
      "set": {
        "field": "_id",
        "value": "{{container_id}}"
      }
    }
  ]
}
2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.