More_like_this query returns no results unless min_doc_freq increased

I am posting the following query against the _explain endpoint of a document with "three" in the title field

{
"query": {
    "bool": {
        "must": [
            {
                "bool": {
                    "should": [
                        {
                            "more_like_this": {
                                "min_doc_freq": 1,
                                "fields": [
                                    "title"
                                ],
                                "max_query_terms": 10,
                                "like": "Three Reasons Spice Girls Will Reunite (And Three Why They Won't)",
                                "min_term_freq": 1
                            }
                        },
                        {
                            "more_like_this": {
                                "min_doc_freq": 1,
                                "fields": [
                                    "description"
                                ],
                                "max_query_terms": 10,
                                "like": "Three Reasons Spice Girls Will Reunite (And Three Why They Won't)",
                                "min_term_freq": 1
                            }
                        }
                    ]
                }
            }
        ]
    }
}

}

The response/explanation I get:

"matched": false,
"explanation": {
    "value": 0,
    "description": "Failure to meet condition(s) of required/prohibited clause(s)",
    "details": [
        {
            "value": 0,
            "description": "no match on required clause (((title:girl title:and title:why title:will title:thei title:won title:reason title:reunit title:spice title:three)~3) ((description:and description:why description:thei description:won description:girl description:will description:reason description:spice description:reunit description:three)~3))",
            "details": [
                {
                    "value": 0,
                    "description": "No matching clauses",
                    "details": []
                }
            ]
        },
        {
            "value": 0,
            "description": "match on required clause, product of:",
            "details": [
                {
                    "value": 0,
                    "description": "# clause",
                    "details": []
                },
                {
                    "value": 0.023566995,
                    "description": "_type:media, product of:",
                    "details": [
                        {
                            "value": 1,
                            "description": "boost",
                            "details": []
                        },
                        {
                            "value": 0.023566995,
                            "description": "queryNorm",
                            "details": []
                        }
                    ]
                }
            ]
        }
    ]
}

However, if I increase either min_term_freq from 1 to 2, I get a match. Similarly, I get a match if I remove several terms from the like query, for example "Three Reasons Spice (And Three)".

Why does this make sense that it would match as the criteria gets more strict (min_term_freq increases) or the number of terms that can match decreases?

Part of MLT you're not setting is the bit that controls how many of the selected terms have to match.

The default is 30%. Maybe that's why as you increase the min_term_freq your selection of top 10 terms focuses more on boring words that are repeated a lot which means you stand a better chance of getting 30% of them to match.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.