More_like_this query returns no results unless min_doc_freq increased


(Phil Godzin) #1

I am posting the following query against the _explain endpoint of a document with "three" in the title field

{
"query": {
    "bool": {
        "must": [
            {
                "bool": {
                    "should": [
                        {
                            "more_like_this": {
                                "min_doc_freq": 1,
                                "fields": [
                                    "title"
                                ],
                                "max_query_terms": 10,
                                "like": "Three Reasons Spice Girls Will Reunite (And Three Why They Won't)",
                                "min_term_freq": 1
                            }
                        },
                        {
                            "more_like_this": {
                                "min_doc_freq": 1,
                                "fields": [
                                    "description"
                                ],
                                "max_query_terms": 10,
                                "like": "Three Reasons Spice Girls Will Reunite (And Three Why They Won't)",
                                "min_term_freq": 1
                            }
                        }
                    ]
                }
            }
        ]
    }
}

}

The response/explanation I get:

"matched": false,
"explanation": {
    "value": 0,
    "description": "Failure to meet condition(s) of required/prohibited clause(s)",
    "details": [
        {
            "value": 0,
            "description": "no match on required clause (((title:girl title:and title:why title:will title:thei title:won title:reason title:reunit title:spice title:three)~3) ((description:and description:why description:thei description:won description:girl description:will description:reason description:spice description:reunit description:three)~3))",
            "details": [
                {
                    "value": 0,
                    "description": "No matching clauses",
                    "details": []
                }
            ]
        },
        {
            "value": 0,
            "description": "match on required clause, product of:",
            "details": [
                {
                    "value": 0,
                    "description": "# clause",
                    "details": []
                },
                {
                    "value": 0.023566995,
                    "description": "_type:media, product of:",
                    "details": [
                        {
                            "value": 1,
                            "description": "boost",
                            "details": []
                        },
                        {
                            "value": 0.023566995,
                            "description": "queryNorm",
                            "details": []
                        }
                    ]
                }
            ]
        }
    ]
}

However, if I increase either min_term_freq from 1 to 2, I get a match. Similarly, I get a match if I remove several terms from the like query, for example "Three Reasons Spice (And Three)".

Why does this make sense that it would match as the criteria gets more strict (min_term_freq increases) or the number of terms that can match decreases?


(Mark Harwood) #2

Part of MLT you're not setting is the bit that controls how many of the selected terms have to match.

The default is 30%. Maybe that's why as you increase the min_term_freq your selection of top 10 terms focuses more on boring words that are repeated a lot which means you stand a better chance of getting 30% of them to match.