MoreLikeThis percent_terms_to_match


(Justin Treher) #1

Hello,

I have a query like this with 0.93. I don't quite understand what
percent_terms_to_match is doing here. While I have it set to .7, the
"explain" is clearly telling me that it is only matching one of the three
terms in the like text. Why would it not filter this match out when 1 of 3
words match? My interpretation of the documents is that it builds a bool
query with every term in the like_text and then would set a minimum should
match based on the percent. In this case, with three words, if all three
don't match, it should never give results. However, I suspect my
interpretation is wrong. Thanks!

{
"query": {
"more_like_this": {
"fields": [
"title_alias"
],
"like_text": "fish tree lounge",
"min_term_freq": 0,
"max_query_terms": 25,
"percent_terms_to_match": 0.7,
"min_doc_freq": 1,
"analyzer": "standard"
}
},"explain":true
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Harwood) #2

It looks like the % of terms to match is based on the number of input terms
that exist in a shard with the required frequency and not just the total
number of words in the input string.
Terms that have zero doc frequency on a shard (ie. do not even exist in the
index) are not added to the final boolean query so on that shard you may
have only one or two query clauses and not 3. The logic on that shard is
then that 70% of the 2 clauses relevant to that shard must match giving the
possibility of a match on a single term.

Cheers
Mark

On Thursday, October 24, 2013 7:30:04 PM UTC+1, Justin Treher wrote:

Hello,

I have a query like this with 0.93. I don't quite understand what
percent_terms_to_match is doing here. While I have it set to .7, the
"explain" is clearly telling me that it is only matching one of the three
terms in the like text. Why would it not filter this match out when 1 of 3
words match? My interpretation of the documents is that it builds a bool
query with every term in the like_text and then would set a minimum should
match based on the percent. In this case, with three words, if all three
don't match, it should never give results. However, I suspect my
interpretation is wrong. Thanks!

{
"query": {
"more_like_this": {
"fields": [
"title_alias"
],
"like_text": "fish tree lounge",
"min_term_freq": 0,
"max_query_terms": 25,
"percent_terms_to_match": 0.7,
"min_doc_freq": 1,
"analyzer": "standard"
}
},"explain":true
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Justin Treher) #3

Thanks. I thought something like this must have been happening, but I was
not quite sure of the intent of the parameter and function altogether. I
don't think it does what I thought, so I will stick with a match with min
should match parameter.
On Oct 25, 2013 6:27 AM, "Mark Harwood" markharwood@gmail.com wrote:

It looks like the % of terms to match is based on the number of input
terms that exist in a shard with the required frequency and not just the
total number of words in the input string.
Terms that have zero doc frequency on a shard (ie. do not even exist in
the index) are not added to the final boolean query so on that shard you
may have only one or two query clauses and not 3. The logic on that shard
is then that 70% of the 2 clauses relevant to that shard must match giving
the possibility of a match on a single term.

Cheers
Mark

On Thursday, October 24, 2013 7:30:04 PM UTC+1, Justin Treher wrote:

Hello,

I have a query like this with 0.93. I don't quite understand what
percent_terms_to_match is doing here. While I have it set to .7, the
"explain" is clearly telling me that it is only matching one of the three
terms in the like text. Why would it not filter this match out when 1 of 3
words match? My interpretation of the documents is that it builds a bool
query with every term in the like_text and then would set a minimum should
match based on the percent. In this case, with three words, if all three
don't match, it should never give results. However, I suspect my
interpretation is wrong. Thanks!

{
"query": {
"more_like_this": {
"fields": [
"title_alias"
],
"like_text": "fish tree lounge",
"min_term_freq": 0,
"max_query_terms": 25,
"percent_terms_to_match": 0.7,
"min_doc_freq": 1,
"analyzer": "standard"
}
},"explain":true
}

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rSre9kSXAqQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4