What is the difference between mlt and query_string/multi_match queries?
Besides the fact that query_string goes through the query parsing process,
I don't see how More Like This differs from these 2 query types. All 3 of
them allow me to specify multiple fields to search against, and boost the
fields differently. Am I missing something that mlt does differently for
scoring under the covers which query_string and multi_match do differently?
Still learning about all this myself, but I do know the MLT is quite
different as it looks at terms in the source document instead of a a
query_string which would only search on a few words of the document
(assuming you have a non-trivial document).
On Friday, December 7, 2012 2:24:54 PM UTC-8, Mike wrote:
What is the difference between mlt and query_string/multi_match queries?
Besides the fact that query_string goes through the query parsing process,
I don't see how More Like This differs from these 2 query types. All 3 of
them allow me to specify multiple fields to search against, and boost the
fields differently. Am I missing something that mlt does differently for
scoring under the covers which query_string and multi_match do differently?
So are you saying that MLT used tf*idf for relevance scoring,
while query_string & multi_match just check for the existance of the terms
like in a boolean query for scoring?
Isn't that essentially what the other 2 queries do when you have the
default operator set to OR? If MLT is the only query that uses tf*idf,
then what does query_string/multi_match do for scoring that is different?
On Saturday, December 8, 2012 12:49:26 AM UTC-5, Anil Rhemtulla wrote:
Still learning about all this myself, but I do know the MLT is quite
different as it looks at terms in the source document instead of a a
query_string which would only search on a few words of the document
(assuming you have a non-trivial document).
On Friday, December 7, 2012 2:24:54 PM UTC-8, Mike wrote:
What is the difference between mlt and query_string/multi_match queries?
Besides the fact that query_string goes through the query parsing
process, I don't see how More Like This differs from these 2 query types.
All 3 of them allow me to specify multiple fields to search against, and
boost the fields differently. Am I missing something that mlt does
differently for scoring under the covers which query_string and multi_match
do differently?
MLT, query_string and multi_match are all using tf*idf. And as Shay said
you can think of MLT as a big boolean query. MLT just has a lot of "knobs"
you can tweak that affect the resulted boolean query. You can remove very
frequent and rare terms from your query, set boost for terms occurring in
your query multiple times, set how many terms should match in terms
of percentage of the query text and so on.
On Monday, December 10, 2012 11:58:07 AM UTC-5, Mike wrote:
So are you saying that MLT used tf*idf for relevance scoring,
while query_string & multi_match just check for the existance of the terms
like in a boolean query for scoring?
Isn't that essentially what the other 2 queries do when you have the
default operator set to OR? If MLT is the only query that uses tf*idf,
then what does query_string/multi_match do for scoring that is different?
On Saturday, December 8, 2012 12:49:26 AM UTC-5, Anil Rhemtulla wrote:
Still learning about all this myself, but I do know the MLT is quite
different as it looks at terms in the source document instead of a a
query_string which would only search on a few words of the document
(assuming you have a non-trivial document).
On Friday, December 7, 2012 2:24:54 PM UTC-8, Mike wrote:
What is the difference between mlt and query_string/multi_match queries?
Besides the fact that query_string goes through the query parsing
process, I don't see how More Like This differs from these 2 query types.
All 3 of them allow me to specify multiple fields to search against, and
boost the fields differently. Am I missing something that mlt does
differently for scoring under the covers which query_string and multi_match
do differently?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.