I'm wanting to use ES to build a list of related documents for a given
document. However I don't want to rely on the default more like this result
— I'd like a little more control. Each of my documents has three sets of
tags (arrays of strings) and one "primary tag" (single string). I work out
a related article where other articles "must" match a primary tag, and
"should" match any of the other three sets of tags. This works quite well,
but I was wondering whether I'd be able to add in article contents to the
algorithm.
Can I add a more like this query into a boolean "should"? And if so, can I
pass a document ID (like the _mlt API) as opposed to a string for
"like_text"?
it is possible to wrap MoreLikeThis queries into a boolean query. Put your
primary tag field mlt query into a "must" clause and your other tag fields
mlt query into a "should" clause. If you want to start from a specific
document, fetch it with a GetRequest by ID, and read the content of the
primary tag field and other tags fields for filling the "like_text" fields
in the clauses.
Jörg
On Monday, November 19, 2012 4:23:21 PM UTC+1, Nick Dunn wrote:
I'm wanting to use ES to build a list of related documents for a given
document. However I don't want to rely on the default more like this result
— I'd like a little more control. Each of my documents has three sets of
tags (arrays of strings) and one "primary tag" (single string). I work out
a related article where other articles "must" match a primary tag, and
"should" match any of the other three sets of tags. This works quite well,
but I was wondering whether I'd be able to add in article contents to the
algorithm.
Can I add a more like this query into a boolean "should"? And if so, can I
pass a document ID (like the _mlt API) as opposed to a string for
"like_text"?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.