Reindex multiple types from one index to single type in another index

I have two indexes:
twitter and reitwitter

twitter has multiple documents across different types like:
"hits": [
{
"_index": "twitter",
"_type": "tweet",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "tweet2",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch2"
}
},
{
"_index": "twitter",
"_type": "tweet1",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch1"
}
}
]

Now, when I reindex, I wanted to get rid of all the different types and just use one because essentially they have the same field mappings.

I tried several different combinations but I always only get one document instead of those three:
Approach 1:
POST _reindex/
{
"source": {
"index": "twitter"
}
,
"dest": {
"index": "reitwitter",
"type": "reitweet"
}
}

Response:
{
"took": 12,
"timed_out": false,
"total": 3,
"updated": 3,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
Note : It says updated 3 because this was the second time I made the same call I guess?

Second approach:
POST _reindex/
{
"source": {
"index": "twitter",
"query": {
"match_all": {
}
}
}
,
"dest": {
"index": "reitwitter",
"type": "reitweet"
}
}

Same response as first one.

In both cases when I do this:
GET reitwitter/_search
{
"query": {
"match_all": {
}
}
}

I only get one document:

{
"_index": "reitwitter",
"_type": "reitweet",
"_id": "1",
"_score": 1,
"_source": {
"message": "trying out Elasticsearch1"
}

Is this use case even supported by reindex ? If not, do I have to write a script using scan and scroll to get all the documents from source index and reindex them with same doc type in destination?

PS: I don't want to use "_source": ["tweet1", "tweet"] because I have around million doc type which have one document each that I want to map to the same doc type in the destination.

Try the following request instead. Basically it uses a script to concatenate the type and the id of the source documents and set that as the id on the destination document.

POST _reindex/
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "reitwitter",
    "type": "tweet"
  },
  "script": {
    "source": "ctx._id = ctx._type + '-' + ctx._id",
    "lang": "painless"
  }
}

Note: I have assumed you are running version 5.4.0. If you are running an earlier version you may need to swap source for inline in the script object

2 Likes

Worked like a charm :slight_smile:
So I guess the problem was that reindex does not change ids when it finds duplicates in an existing destination doc type.
It just blindly copies the documents to that doctype with the same docutype.

Since all three documents had the same id, only the last one showed up even though it said "updated 3" in the response.

I had to change the "source" to "inline though"

A quick question though:

"inline": "ctx._id = ctx._type + '-' + ctx._id"

how does it know ctx._id (on left) is supposed to be the new id for the destination document?

from what i see, everything is modifying the magic variable "ctx"

So when used on left side it is for destination and on right side it is for source ?

Yes. Basically the ctx object represents the document that will be written to the destination index. Before the script is run this is just set to the source document and then you can make in place update like the above to change the ctx to represent the document you actually want written to the destination index

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.