I need to index 3 levels (or more) of child-parent.
For example, the levels might be an author, a book, and characters from that book.
However, when indexing more than two-levels there is a problem with has_child and has_parent queries and filters.
If I have 5 shards, I get about one fifth of the results when running a "has_parent" query on the lowest level (characters) or a has_child query on the second level(books).
My guess is that a book gets indexed to a shard by it's parent id and so will reside together with his parent (author), but a character gets indexed to a shard based on the hash of the book id, which does not necessarily complies with the actual shard the book was indexed on.
And so, this means that all character of books of the same author do not necessarily reside in the same shard (kind of crippling the whole child-parent advantage really).
Am I doing something wrong? How can I resolve this, as I am in real need for complex queries such as "what authors wrote books with female characters" for example.
If you're indexing mulit-level parent child documents you need to use the routing query string option in addition to the parent query string
option. The routing will always contain the id of the
first hierarchy level (author in your case). This way all books from the
same author and characters of these books always reside on the same shard.
I need to index 3 levels (or more) of child-parent.
For example, the levels might be an author, a book, and characters from
that
book.
However, when indexing more than two-levels there is a problem with
has_child and has_parent queries and filters.
If I have 5 shards, I get about one fifth of the results when running a
"has_parent" query on the lowest level (characters) or a has_child query on
the second level(books).
My guess is that a book gets indexed to a shard by it's parent id and so
will reside together with his parent (author), but a character gets indexed
to a shard based on the hash of the book id, which does not necessarily
complies with the actual shard the book was indexed on.
And so, this means that all character of books of the same author do not
necessarily reside in the same shard (kind of crippling the whole
child-parent advantage really).
Am I doing something wrong? How can I resolve this, as I am in real need
for
complex queries such as "what authors wrote books with female characters"
for example.
If you're indexing mulit-level parent child documents you need to use the routing query string option in addition to the parent query string
option. The routing will always contain the id of the
first hierarchy level (author in your case). This way all books from the
same author and characters of these books always reside on the same shard.
I need to index 3 levels (or more) of child-parent.
For example, the levels might be an author, a book, and characters from
that
book.
However, when indexing more than two-levels there is a problem with
has_child and has_parent queries and filters.
If I have 5 shards, I get about one fifth of the results when running a
"has_parent" query on the lowest level (characters) or a has_child query
on
the second level(books).
My guess is that a book gets indexed to a shard by it's parent id and so
will reside together with his parent (author), but a character gets
indexed
to a shard based on the hash of the book id, which does not necessarily
complies with the actual shard the book was indexed on.
And so, this means that all character of books of the same author do not
necessarily reside in the same shard (kind of crippling the whole
child-parent advantage really).
Am I doing something wrong? How can I resolve this, as I am in real need
for
complex queries such as "what authors wrote books with female characters"
for example.
In the case of books an characters, I could suggest to nest characters in
book documents, but I'm guessing your real application isn't about books
@Martijn, maybe you could add the routing fix to the docs where parent
parameters is explained?
On Apr 3, 2013 2:47 PM, "eranid" eranid@gmail.com wrote:
If you're indexing mulit-level parent child documents you need to use the routing query string option in addition to the parent query string
option. The routing will always contain the id of the
first hierarchy level (author in your case). This way all books from the
same author and characters of these books always reside on the same shard.
I need to index 3 levels (or more) of child-parent.
For example, the levels might be an author, a book, and characters from
that
book.
However, when indexing more than two-levels there is a problem with
has_child and has_parent queries and filters.
If I have 5 shards, I get about one fifth of the results when running a
"has_parent" query on the lowest level (characters) or a has_child query
on
the second level(books).
My guess is that a book gets indexed to a shard by it's parent id and so
will reside together with his parent (author), but a character gets
indexed
to a shard based on the hash of the book id, which does not necessarily
complies with the actual shard the book was indexed on.
And so, this means that all character of books of the same author do not
necessarily reside in the same shard (kind of crippling the whole
child-parent advantage really).
Am I doing something wrong? How can I resolve this, as I am in real need
for
complex queries such as "what authors wrote books with female characters"
for example.
On 3 April 2013 20:41, Jaap Taal jaap@q42.nl wrote:
In the case of books an characters, I could suggest to nest characters in
book documents, but I'm guessing your real application isn't about books
@Martijn, maybe you could add the routing fix to the docs where parent
parameters is explained?
On Apr 3, 2013 2:47 PM, "eranid" eranid@gmail.com wrote:
If you're indexing mulit-level parent child documents you need to use
the routing query string option in addition to the parent query string
option. The routing will always contain the id of the
first hierarchy level (author in your case). This way all books from the
same author and characters of these books always reside on the same shard.
I need to index 3 levels (or more) of child-parent.
For example, the levels might be an author, a book, and characters from
that
book.
However, when indexing more than two-levels there is a problem with
has_child and has_parent queries and filters.
If I have 5 shards, I get about one fifth of the results when running a
"has_parent" query on the lowest level (characters) or a has_child
query on
the second level(books).
My guess is that a book gets indexed to a shard by it's parent id and so
will reside together with his parent (author), but a character gets
indexed
to a shard based on the hash of the book id, which does not necessarily
complies with the actual shard the book was indexed on.
And so, this means that all character of books of the same author do not
necessarily reside in the same shard (kind of crippling the whole
child-parent advantage really).
Am I doing something wrong? How can I resolve this, as I am in real
need for
complex queries such as "what authors wrote books with female
characters"
for example.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.