Update Nested Document


(Nicolas Lalevée) #1

First, my context: I have some "topics", in which there are some "posts". Both are indexed and today I search in both separately [1]. To improve the user experience, I want to search in both whereas showing a single page of result of topics. When hit on topics, add the topic to the results. When hit on a post, add the owning topic to the result and show the highlighted post.

For that use case, nested queries seems to be the perfect tool (apart from the post highlight, guessing that I may be able to help implement something about it).
I don't remember where in the doc but it was stated that nested document cannot be updated individually, separately from their parent document. That is a blocker for me, I would have too many nested document per root document. I have some topics which have more than thousand posts, and these topics are updated the most frequently.

So I dug into the code to understand the limitation, to understand if what I want to implement is just feasible. But I didn't find anything meaningful. Could someone explain it to me ?

cheers,
Nicolas

[1] http://www.scoop.it/search#q=sport&page=3&offset=10&limit=10&isAfterLastPageNumOfShortest=false


(Clinton Gormley) #2

Hi Nicolas

I don't remember where in the doc but it was stated that nested
document cannot be updated individually, separately from their parent
document. That is a blocker for me, I would have too many nested
document per root document. I have some topics which have more than
thousand posts, and these topics are updated the most frequently.

You may want to look at parent/child queries instead.

http://www.elasticsearch.org/guide/reference/mapping/parent-field.html
http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html

clint


(Nicolas Lalevée) #3

Le 8 août 2011 à 14:11, Clinton Gormley a écrit :

Hi Nicolas

I don't remember where in the doc but it was stated that nested
document cannot be updated individually, separately from their parent
document. That is a blocker for me, I would have too many nested
document per root document. I have some topics which have more than
thousand posts, and these topics are updated the most frequently.

You may want to look at parent/child queries instead.

http://www.elasticsearch.org/guide/reference/mapping/parent-field.html
http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html

For what I could read, I don't remember where again (maybe just a user feedback), these kind of queries are quite slow compared to the nested ones. I'll look closer to them. Thanks for the pointers.

And after a second deeper look into the code, I think I finally understand how the BlockJoinQuery works. To be efficient, documents have to be ordered accordingly, the children before the owing parent. It avoids any lookup in Lucene to get the parent of a child. Quite smart by the way !

cheers,
Nicolas


(Clinton Gormley) #4

Hi Nicolas

You may want to look at parent/child queries instead.

http://www.elasticsearch.org/guide/reference/mapping/parent-field.html
http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html

For what I could read, I don't remember where again (maybe just a user
feedback), these kind of queries are quite slow compared to the nested
ones. I'll look closer to them. Thanks for the pointers.

They are slower. Whether they are too slow or not depends on your
experience with them. ES needs to run two queries: one for the child
docs, then one for the parents. That said, child docs are stored on the
same shard as the parent docs, so that is already optimized.

It's a toss up between wanting to query parent/child docs together, and
not wanting to reindex the parent doc every time a child is updated :slight_smile:

clint


(Nicolas Lalevée) #5

Le 8 août 2011 à 18:20, Clinton Gormley a écrit :

Hi Nicolas

You may want to look at parent/child queries instead.

http://www.elasticsearch.org/guide/reference/mapping/parent-field.html
http://www.elasticsearch.org/guide/reference/query-dsl/top-children-query.html

For what I could read, I don't remember where again (maybe just a user
feedback), these kind of queries are quite slow compared to the nested
ones. I'll look closer to them. Thanks for the pointers.

They are slower. Whether they are too slow or not depends on your
experience with them. ES needs to run two queries: one for the child
docs, then one for the parents. That said, child docs are stored on the
same shard as the parent docs, so that is already optimized.

It's a toss up between wanting to query parent/child docs together, and
not wanting to reindex the parent doc every time a child is updated :slight_smile:

I guess that last sentence sum up the problematic very well.
Now I'll try to make a choice with some tests :slight_smile:

Thanks
Nicolas


(system) #6