MLT query delivering strange results

daniel_kummer · November 20, 2014, 11:39am

I have been trying to figure out how exactly the more_like_this query
behaves. The doc says "Under the hood, more_like_this simply creates
multiple should clauses in a bool query of interesting terms extracted from
some provided text." But I found several examples that I could not explain.
This one illustrates it:

I am using elasticsearch-1.4.0. I am creating an index like this (no
mapping defined before):
curl -XPUT 'localhost:9200/twitter/tweet/1' -d '{"user" : "user1",
"message" : "aaa"}'
curl -XPUT 'localhost:9200/twitter/tweet/2' -d '{"user" : "user1",
"message" : "aaa bbb"}'
curl -XPUT 'localhost:9200/twitter/tweet/3' -d '{"user" : "user1",
"message" : "bbb aaa"}'
curl -XPUT 'localhost:9200/twitter/tweet/4' -d '{"user" : "user2",
"message" : "bbb"}'
curl -XPUT 'localhost:9200/twitter/tweet/5' -d '{"user" : "user2",
"message" : "aaa bbb"}'
curl -XPUT 'localhost:9200/twitter/tweet/6' -d '{"user" : "user2",
"message" : "bbb aaa"}'

Then I query it:
curl -XGET
'http://localhost:9200/twitter/tweet/_search?pretty=true&size=10' -d '{
"query": {
"more_like_this_field": {
"message": {
"like_text": "aaa bbb",
"percent_terms_to_match": 1,
"min_term_freq": 1,
"max_query_terms": 3,
"min_doc_freq": 1
}
}
}
}
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 14.4000225,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "4",
"_score" : 14.4000225,
"_source":{"user" : "user2", "message" : "bbb"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "2",
"_score" : 12.729599,
"_source":{"user" : "user1", "message" : "aaa bbb"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "5",
"_score" : 12.72813,
"_source":{"user" : "user2", "message" : "aaa bbb"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "3",
"_score" : 12.728111,
"_source":{"user" : "user1", "message" : "bbb aaa"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "6",
"_score" : 12.5501995,
"_source":{"user" : "user2", "message" : "bbb aaa"}
} ]
}
}

So text 1 "aaa" is missing. I get the same result if I use "like_text":
"bbb aaa" in the above query. However, if I use "like_text": "aaa" I get
what I would expect: All texts except "bbb" are returned.

What kind of should-query is generated by more_like_this in the above
example? I would have expected:
curl -XGET
'http://localhost:9200/twitter/tweet/_search?pretty=true&size=10' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"message": "aaa"
}
},
{
"match": {
"message": "bbb"
}
}
],
"minimum_should_match": 2
}
}
}'
but this obviously returns neither "aaa" nor "bbb".

Why does the above more_like_this query return "bbb" but not "aaa"?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53fae773-9359-4a1a-980e-a42d1dfd6d0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Wrong result in more_like_this query Elasticsearch	1	305	July 6, 2017
More_like_this query returns no results unless min_doc_freq increased Elasticsearch	2	856	February 8, 2019
Morelikethis returns no results Elasticsearch	1	391	July 6, 2017
More like this query not working on 1.7 Elasticsearch	2	766	July 5, 2017
More Like This Query not giving any hits Elasticsearch	6	1408	July 5, 2017

MLT query delivering strange results

Related topics