I am doing a more_like_this query as follows
"more_like_this" : {
"fields" : [
"title", "keywords", "description"
],
"like" : [
{
"_index" : "indexName",
"_id" : "12354",
"_type": "Object"
}
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
I notice that if all the fields have values set, I'll get matching results as I expect for similar documents.
However, if my documents have empty string values for lets say the 'description' field, my MLT query always returns 0 values. So, even 3 documents with the same values for 'title','keywords' and all have empty description, I will not get matches.
I'm struggling to find some combination of term selection parameters that just ignores likeness if there is no value for a given field. I'm confused why it would be giving no results. It sounds like from the docs that
The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms.
So fields with no values at all shouldn't affect the query that is given, as the top K terms shouldn't include an empty field, or the empty should at least be ignored? Is there a way to request to see what query is created behind the scenes that may give some insight into where its going wrong?
Any help would be appreciated!
Possibly important: the fields I'm using are analyzed text types. Changing the analyzers seems to affect whether it finds as a match/works on empty, so there may be some weird interaction there. I've tried creating them all using the same analyzer, and still emptyString doesn't appear to match up right.