The reason for this has to do with how document routing is done (how we determine which shard a particular document lives on).
By default, the routing is done by taking the
id of the document (as specified in the URL path when indexing), getting a hash of that value and then getting the modulo of the value wrt the number of shards (the remainder if we integer divide the hash value by the number of shards). So, for example if I create an index with 5 shards, and index a document with
foo then we do the following:
shardNum = hash("foo") % 5
Which gives us a number between 0 and 4. Lets say in this case the
3 so the document will be indexed on shard 3 of that index. The same applies when you GET a document. You send a request like the following:
curl -XGET "http://localhost:9200/my_index/my_type/foo"
From just this information we need to know which shard to go and get the document from so we use the same formula as above and hash
"foo" and then get the modulo wrt the number of shards (
5) which will again be
3 and we can go to the correct shard to get the document.
With parent-child documents it works slightly differently because when indexing a child document instead of using the
id as the routing value that we hash, we use the
parent query parameter. this means that both the parent and child end up on the same shard because the result of the routing algorithm will be the same for both.
When you index a grandchild the
parent query parameter points to the child document and we have no reference in the request that there is a parent above that child. Your request in that case might look like this:
curl -XGET "http://localhost:9200/my_index/my_type/baz?parent=bar"
Now because the request does not mention the fact that the parent
bar might have a parent of it's own then Elasticsearch would use
bar as the routing value in the routing algorithm and the document may end up on a different shard to its parent (and grandparent) which had
foo used as its routing value (I am assuming here that the document with id
bar was indexed referencing a parent
So we need to tell Elasticsearch that it should not use the
parent as the routing value for the grandchild document but instead should use the id of the document at the top generation (in this case
"foo"). We have to do this by setting a custom routing value:
curl -XGET "http://localhost:9200/my_index/my_type/baz?parent=bar&routing=foo"
Does that make sense?