The reason for this has to do with how document routing is done (how we determine which shard a particular document lives on).
By default, the routing is done by taking the id
of the document (as specified in the URL path when indexing), getting a hash of that value and then getting the modulo of the value wrt the number of shards (the remainder if we integer divide the hash value by the number of shards). So, for example if I create an index with 5 shards, and index a document with id
of foo
then we do the following:
shardNum = hash("foo") % 5
Which gives us a number between 0 and 4. Lets say in this case the shardNum
is 3
so the document will be indexed on shard 3 of that index. The same applies when you GET a document. You send a request like the following:
curl -XGET "http://localhost:9200/my_index/my_type/foo"
From just this information we need to know which shard to go and get the document from so we use the same formula as above and hash "foo"
and then get the modulo wrt the number of shards (5
) which will again be 3
and we can go to the correct shard to get the document.
With parent-child documents it works slightly differently because when indexing a child document instead of using the id
as the routing value that we hash, we use the parent
query parameter. this means that both the parent and child end up on the same shard because the result of the routing algorithm will be the same for both.
When you index a grandchild the parent
query parameter points to the child document and we have no reference in the request that there is a parent above that child. Your request in that case might look like this:
curl -XGET "http://localhost:9200/my_index/my_type/baz?parent=bar"
Now because the request does not mention the fact that the parent bar
might have a parent of it's own then Elasticsearch would use bar
as the routing value in the routing algorithm and the document may end up on a different shard to its parent (and grandparent) which had foo
used as its routing value (I am assuming here that the document with id bar
was indexed referencing a parent foo
).
So we need to tell Elasticsearch that it should not use the parent
as the routing value for the grandchild document but instead should use the id of the document at the top generation (in this case "foo"
). We have to do this by setting a custom routing value:
curl -XGET "http://localhost:9200/my_index/my_type/baz?parent=bar&routing=foo"
```
Does that make sense?