I want a search that must be very common: return topics best suited to a search.
The topic document has the topic title (and perhaps other fields)
The children docs are the posts in the topic, and they have several fields the most important is the post itself.
I don't want the search to return children individually, only parents.
I want to score the search by balancing topic title to a score relative to the post body.
And the whole thing should be filterable on such things as date ranges, users post, message board name, etc (attributes of the children or attributes of the parent)
Having trouble getting this simple thing to work: to return only parents ordered by best match. based on their text and the text in their children....
You can boost individual query clauses, i.e., you could boost the topic title query clause higher than the others so that if a document matches on the title query it will be returned at the top of the results. Also, the has_child query has a score_mode parameter to help you tune the scoring.
If you're still having trouble getting it to work post the query you have so people can give pointers to help.
I've tried two different approaches, neither satisfactory:
Here is one, the problem with this is if I order the matches by date not score, then it returns individual posts:
Another approach, the problem is it favours short topics (1 post) with single words that i'm searching for (highest hit to size ratio?). Also to make this work, I had to repeat the topic titles in the post document as another field (this screenshot is just of the inner part of the query, substituting for the inner part in the screenshot above):
You can infer the schema from these little queries, not that it is rocket science: topics are parent documents, just title and other attributes. Posts are child documents, body of post and title repeated if I want to waste that space...
EDIT; I changed "avg" to "sum" and now it (second solution) favours long topics with many children. I don't really like having to repeat the topic title field in the child however.
If you remove the title from the child doc and search on the title field in the parent document and the note field in the child document (as per your first example), the score from each query clause will be used to make up the overall score.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.