How exactly does max_query_terms works in a MLT query

lok63 · May 18, 2020, 11:47am

As the official documentation states, max_query_terms is the MAX number of query terms that will be selected. I would like to know how this works.

For example if I set max_query_terms = 6, then using the explain parameter, I can see that for some results, only 2 or 3 words were used to calculate the BM25 score. I really want to understand how this works and when ES decides to use only 2 words or the max number I defined?

polyfractal · May 18, 2020, 4:54pm

If you'd like to poke around the code, that limit is used here: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/common/lucene/search/XMoreLikeThis.java#L728

My understanding is that the query will look up the term frequencies for a particular field, then construct a priority queue that is sized to either max_query_terms or the number of terms in the field, whichever is smaller. The priority queue determines which terms are added to the final boolean query that MLT creates.

So if a field only has two different values across all the docs (on the shard), the queue will be sized to 2 rather than max_query_terms. Fields with higher cardinality will bump into the limit instead and so the priority queue will be limited in size.

*Caveat: not an expert at MLT, so grain of salt

system · June 15, 2020, 4:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting similarity scores by issuing MLT-queries doesn't work for some documents Elasticsearch	4	424	May 18, 2020
More like this query term questions Elasticsearch	2	184	November 10, 2022
Logic for selecting terms w max_expansions and phrase_prefix query Elasticsearch	1	557	July 5, 2017
More Like This API and size Elasticsearch	2	287	July 6, 2017
Question on sub-query scoring Elasticsearch	1	755	May 22, 2018

How exactly does max_query_terms works in a MLT query

Related topics