MoreLikeThis query, what does percent_terms_to_match do?

Nick_Dunn · May 21, 2012, 4:30pm

I'm trying out morelikethis
(http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html) and
it's working well. So easy

I'm finding entries by related tags so I dropped min_doc_freq to 1 (one
or more tag required for a match) and max_query_terms to 100 (an entry
could be tagged with up to 100 tags) however from the docs it's not clear
to me what percent_terms_to_match does:

The percentage of terms to match on (float value). Defaults to 0.3 (30
percent).

Could someone explain this in other words please, perhaps an example of
what might happen if I increase or decrease from the default? When I try it
on my sample data it increases/reduces the number of hits and doesn't seem
to affect the score of each hit, so I'm just not sure what it's doing.

Thanks.

kimchy · May 23, 2012, 10:30pm

Effectively, what happens in the more like this query is that it builds a
big boolean query with should clauses for each term. The percent terms to
match means that out of all the terms built, at least X percent should
match (effectively, setting the minimum_should_match parameter on it).

On Mon, May 21, 2012 at 6:30 PM, Nick Dunn nick@nick-dunn.co.uk wrote:

I'm trying out morelikethis (
Elasticsearch Platform — Find real-time answers at scale | Elastic)
and it's working well. So easy

I'm finding entries by related tags so I dropped min_doc_freq to 1 (one
or more tag required for a match) and max_query_terms to 100 (an entry
could be tagged with up to 100 tags) however from the docs it's not clear
to me what percent_terms_to_match does:

The percentage of terms to match on (float value). Defaults to 0.3 (30
percent).

Could someone explain this in other words please, perhaps an example of
what might happen if I increase or decrease from the default? When I try it
on my sample data it increases/reduces the number of hits and doesn't seem
to affect the score of each hit, so I'm just not sure what it's doing.

Thanks.

Nick_Dunn · May 24, 2012, 8:31am

Ah that makes sense, thanks Shay.

So increasing percent_terms_to_match to 0.5 means that if I have 20 tags, 50% (10) of these should match in the other document for it to be returned. Increasing the value increases precision, while deceasing it decreases precision but increases recall.

Cheers.

On 23 May 2012, at 23:30, Shay Banon wrote:

Effectively, what happens in the more like this query is that it builds a big boolean query with should clauses for each term. The percent terms to match means that out of all the terms built, at least X percent should match (effectively, setting the minimum_should_match parameter on it).

On Mon, May 21, 2012 at 6:30 PM, Nick Dunn nick@nick-dunn.co.uk wrote:
I'm trying out morelikethis (Elasticsearch Platform — Find real-time answers at scale | Elastic) and it's working well. So easy

I'm finding entries by related tags so I dropped min_doc_freq to 1 (one or more tag required for a match) and max_query_terms to 100 (an entry could be tagged with up to 100 tags) however from the docs it's not clear to me what percent_terms_to_match does:

The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).

Could someone explain this in other words please, perhaps an example of what might happen if I increase or decrease from the default? When I try it on my sample data it increases/reduces the number of hits and doesn't seem to affect the score of each hit, so I'm just not sure what it's doing.

Thanks.

kimchy · May 25, 2012, 10:44pm

Yep.

On Thu, May 24, 2012 at 10:31 AM, Nick Dunn nick@nick-dunn.co.uk wrote:

Ah that makes sense, thanks Shay.

So increasing percent_terms_to_match to 0.5 means that if I have 20
tags, 50% (10) of these should match in the other document for it to be
returned. Increasing the value increases precision, while deceasing it
decreases precision but increases recall.

Cheers.

On 23 May 2012, at 23:30, Shay Banon wrote:

Effectively, what happens in the more like this query is that it builds a
big boolean query with should clauses for each term. The percent terms to
match means that out of all the terms built, at least X percent should
match (effectively, setting the minimum_should_match parameter on it).

On Mon, May 21, 2012 at 6:30 PM, Nick Dunn nick@nick-dunn.co.uk wrote:

I'm trying out morelikethis (
Elasticsearch Platform — Find real-time answers at scale | Elastic)
and it's working well. So easy

I'm finding entries by related tags so I dropped min_doc_freq to 1 (one
or more tag required for a match) and max_query_terms to 100 (an entry
could be tagged with up to 100 tags) however from the docs it's not clear
to me what percent_terms_to_match does:

The percentage of terms to match on (float value). Defaults to 0.3 (30
percent).

Could someone explain this in other words please, perhaps an example of
what might happen if I increase or decrease from the default? When I try it
on my sample data it increases/reduces the number of hits and doesn't seem
to affect the score of each hit, so I'm just not sure what it's doing.

Thanks.

Topic		Replies	Views
MoreLikeThis percent_terms_to_match Elasticsearch	3	597	July 6, 2017
Percentage of matched terms in Elasticsearch Elasticsearch	1	2613	July 5, 2017
More like this once again Elasticsearch	3	275	July 6, 2017
How to use minimum should match Elasticsearch	2	634	September 21, 2017
More_like_this query returns no results unless min_doc_freq increased Elasticsearch	2	856	February 8, 2019

MoreLikeThis query, what does percent_terms_to_match do?

Related topics