MLT query delivering strange results

I have been trying to figure out how exactly the more_like_this query
behaves. The doc says "Under the hood, more_like_this simply creates
multiple should clauses in a bool query of interesting terms extracted from
some provided text." But I found several examples that I could not explain.
This one illustrates it:

I am using elasticsearch-1.4.0. I am creating an index like this (no
mapping defined before):
curl -XPUT 'localhost:9200/twitter/tweet/1' -d '{"user" : "user1",
"message" : "aaa"}'
curl -XPUT 'localhost:9200/twitter/tweet/2' -d '{"user" : "user1",
"message" : "aaa bbb"}'
curl -XPUT 'localhost:9200/twitter/tweet/3' -d '{"user" : "user1",
"message" : "bbb aaa"}'
curl -XPUT 'localhost:9200/twitter/tweet/4' -d '{"user" : "user2",
"message" : "bbb"}'
curl -XPUT 'localhost:9200/twitter/tweet/5' -d '{"user" : "user2",
"message" : "aaa bbb"}'
curl -XPUT 'localhost:9200/twitter/tweet/6' -d '{"user" : "user2",
"message" : "bbb aaa"}'

Then I query it:
curl -XGET
'http://localhost:9200/twitter/tweet/_search?pretty=true&size=10' -d '{
"query": {
"more_like_this_field": {
"message": {
"like_text": "aaa bbb",
"percent_terms_to_match": 1,
"min_term_freq": 1,
"max_query_terms": 3,
"min_doc_freq": 1
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
"hits" : {
"total" : 5,
"max_score" : 14.4000225,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "4",
"_score" : 14.4000225,
"_source":{"user" : "user2", "message" : "bbb"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "2",
"_score" : 12.729599,
"_source":{"user" : "user1", "message" : "aaa bbb"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "5",
"_score" : 12.72813,
"_source":{"user" : "user2", "message" : "aaa bbb"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "3",
"_score" : 12.728111,
"_source":{"user" : "user1", "message" : "bbb aaa"}
}, {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "6",
"_score" : 12.5501995,
"_source":{"user" : "user2", "message" : "bbb aaa"}
} ]

So text 1 "aaa" is missing. I get the same result if I use "like_text":
"bbb aaa" in the above query. However, if I use "like_text": "aaa" I get
what I would expect: All texts except "bbb" are returned.

What kind of should-query is generated by more_like_this in the above
example? I would have expected:
curl -XGET
'http://localhost:9200/twitter/tweet/_search?pretty=true&size=10' -d '{
"query": {
"bool": {
"should": [
"match": {
"message": "aaa"
"match": {
"message": "bbb"
"minimum_should_match": 2
but this obviously returns neither "aaa" nor "bbb".

Why does the above more_like_this query return "bbb" but not "aaa"?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit