A deeper understanding of term vectors and the more like this query

bbaronas · January 28, 2016, 3:58pm

I am trying to gain a deeper understanding of the more_like_this query. As I understand it, the more_like_this query will derive term vectors on either the string fields specified, or on all of the string fields of a specified document or documents, and then use those terms as a query to find other documents in the index. Further, I understand that these terms are derived using tf/idf based on values found not only in the particular field, but also across that same field in each document containing that field.

First, is this a correct understanding of how the more_like_this query works?

Next, if that is the case, does that mean more_like_this essentially ignores other terms that may exist in the field, but did not make it into the max_query_terms?

Finally, assuming the above questions are the case, what are some suggestions on maximizing "quality" terms besides just providing a list of stopwords to drop out of the running? Should I leverage max_doc_freq in a manner similar to the cutoff_frequency in the common terms query?

I am very new to search and I am trying to gain some insight on how some of these query tools work.

Thank you.

Topic		Replies	Views
Speeding up "more_like_this" query Elasticsearch	1	473	November 20, 2019
MoreLikeThis query performance with some extremely common words Elasticsearch	1	750	July 5, 2017
Term vectors used in a “more like this” search Elasticsearch	1	604	June 30, 2017
More like this query term questions Elasticsearch	2	184	November 10, 2022
Elasticsearch more_like_this Elasticsearch	1	663	July 5, 2017

A deeper understanding of term vectors and the more like this query

Related topics