Search for topics best matching a phrase (parent/child top_children)


(justin) #1

I want a search that must be very common: return topics best suited to a search.

The topic document has the topic title (and perhaps other fields)
The children docs are the posts in the topic, and they have several fields the most important is the post itself.

I don't want the search to return children individually, only parents.

I want to score the search by balancing topic title to a score relative to the post body.
And the whole thing should be filterable on such things as date ranges, users post, message board name, etc (attributes of the children or attributes of the parent)

Having trouble getting this simple thing to work: to return only parents ordered by best match. based on their text and the text in their children....


(Dan Tuffery) #2

You'll need to use the bool query and combine a number should query clauses that contain queries to do what you want:

Use a has_child query to do this.

Use a filtered query for this, to filter on the child docs use the has_child filter.

You can boost individual query clauses, i.e., you could boost the topic title query clause higher than the others so that if a document matches on the title query it will be returned at the top of the results. Also, the has_child query has a score_mode parameter to help you tune the scoring.

If you're still having trouble getting it to work post the query you have so people can give pointers to help.


(justin) #3

I've tried two different approaches, neither satisfactory:

Here is one, the problem with this is if I order the matches by date not score, then it returns individual posts:

Another approach, the problem is it favours short topics (1 post) with single words that i'm searching for (highest hit to size ratio?). Also to make this work, I had to repeat the topic titles in the post document as another field (this screenshot is just of the inner part of the query, substituting for the inner part in the screenshot above):

You can infer the schema from these little queries, not that it is rocket science: topics are parent documents, just title and other attributes. Posts are child documents, body of post and title repeated if I want to waste that space...

EDIT; I changed "avg" to "sum" and now it (second solution) favours long topics with many children. I don't really like having to repeat the topic title field in the child however.


(Dan Tuffery) #4

If you remove the title from the child doc and search on the title field in the parent document and the note field in the child document (as per your first example), the score from each query clause will be used to make up the overall score.


(justin) #5

thanks for the help.


(system) #6