On using reindex API for reindexing 5 TB of data, to make the process faster I would like to create auto generated ids for new index as mentioned here.How do we do it??
Hi @praveengadugin1 ,
As the name suggest auto-generated is automatic, so you just need to remove the id field that you have and reindex your data elastic will assign an auto-generated one in "_id" field.
Check about ingest to remove the id field if you have one.
You can try:
POST my_index/_doc
{"foo": "bar"}
it will return:
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "X80Qu2sBMcYBL_rB5lGJ",
"_version" : 1,
"result" : "created",
the auto generated id is "X80Qu2sBMcYBL_rB5lGJ"
Hope it help.
I am linking my old post of same
Try
POST _reindex
{
"source": {
"index": ["twitter", "twitter2"]
},
"dest": {
"index": "new_twitter"
},
"script": {
"source": "ctx._id=null",
"lang": "painless"
}
}
Hi,
With TB of data I guess that ingest may be faster than script but better to make your own bench and share it here it may be useful for other people.
About ingest you can use remove processor to remove the id field.
https://www.elastic.co/guide/en/elasticsearch/reference/current/remove-processor.html
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.