Reindex api is not routing the document in destination index


(saurabh jaluka) #1

Hi, I am trying to use reindex api to migrate data from index sites-v3 to sites-v5. Seems like the _routing is not taking effect on dest index.

Context:
I am using ES v5.4.0, with 20 shards. I have 2 index sites-v3 and sites-v5. Source index only has data for one user. So the routing is on user's id. Only difference in the mapping of sites-v5 is I am removing _parent from the mapping. Each document in my sites-v3 has _routing and _parent field. So I am removing the _parent field while I send the document to index on dest:

Reindex Query I am using:
curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
"source": {
"index": "sites-v3",
"size": 100
},
"dest": {
"index": "sites-v5"
},
"script":{
"inline":"ctx._parent=null;",
"lang":"painless"
}
}'

When I execute _cat/shards query on both index, I see on index sites-v3 all documents are in one shard as routing is by user id. After migration sites-v5 has data split in all the shards. Which makes me feel either I am not doing it the right way or is it a bug.

Debug info:

curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
"source": {
"index": "sites-v3",
"size": 1
},
"dest": {
"index": "sites-v5"
},
"script":{
"inline":"ctx._parent=null; Debug.explain(ctx)",
"lang":"painless"
}
}'

"to_string" : "{_routing=10a3098db733aa491bd453aa6db574b8, _ttl=null, op=index, _parent=null, _index=sites-v3, _type=post, _source={engineKey=10a3098db733aa491bd453aa6db574b8, image=[], documentType=post, author=[root], domain=http://localhost, externalId=1, inStock=false, categories=[Uncategorized], title=Hello world!, body=Welcome to WordPress. This is your first post. Edit or delete it, then start writing!, url=http://localhost/blog/2017/08/03/hello-world/, timestamp=2017-08-03T23:08:35.000}, _id=AV3NYmfOVC07KnJYanjR, _version=-1, _timestamp=null}"

That looks correct to me _parent is set to null and _routing is there.

Please help me guys. I am stuck.

Saurabh


(Nik Everett) #2

Clearing the _parent automatically clears the _routing because when you use _parent 99% of the time you use _routing. Preserving the routing but clearing the parent isn't actually supported because of https://github.com/elastic/elasticsearch/issues/26183. You can work around it by changing the routing like ctx._parent=null; ctx._routing = ctx._routing + "hack". That'll give you new routing and you'll have to update your application to use it. Or you can use something like the perl client or logstash to perform the reindex. I suspect it doesn't have that particular bug.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.