MLT returns zero results


(Christoph Haas) #1

I'm struggling with using MLT ("more-like-that") API queries. In my case
the simple documents contain two fields.

Example:

{
_index: "debshots",
_type: "jdbc",
_id: "396",
_version: 35,
exists: true,
_source: {
description: "Alarm Clock for GTK Environments",
name: "alarm-clock"
}
}

But when I'm GETting http://localhost:9200/debshots/jdbc/396/_mlt
Elasticsearch returns zero results:

{
took: 3,
timed_out: false,
_shards: {
total: 1,
successful: 1,
failed: 0
},
hits: {
total: 0,
max_score: null,
hits: [ ]
}
}

There are many other documents with a description like "Alarm curl
plugin for uWSGI" so I had expected that at least the "Alarm" is a term
that makes it "more-like-that"-style.

I'd welcome a hint what is going wrong here. Thanks.

Kindly
…Christoph

--
A distributed system is one in which I cannot get something done
because a machine I've never heard of is down. (Leslie Lamport)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Randall McRee) #2

Check out:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

Note that the default values are designed for a large corpus, not a test
example. In particular, you are butting your head against one or more of
(and this is a guess, but I do have a working implementation of MLT):

percent_terms_to_match
min_term_freq
min_doc_freq

pretty sure you need to set these, not let them default.

On Tue, Nov 19, 2013 at 1:17 PM, Christoph Haas email@christoph-haas.dewrote:

I'm struggling with using MLT ("more-like-that") API queries. In my case
the simple documents contain two fields.

Example:

{
_index: "debshots",
_type: "jdbc",
_id: "396",
_version: 35,
exists: true,
_source: {
description: "Alarm Clock for GTK Environments",
name: "alarm-clock"
}
}

But when I'm GETting http://localhost:9200/debshots/jdbc/396/_mlt
Elasticsearch returns zero results:

{
took: 3,
timed_out: false,
_shards: {
total: 1,
successful: 1,
failed: 0
},
hits: {
total: 0,
max_score: null,
hits: [ ]
}
}

There are many other documents with a description like "Alarm curl
plugin for uWSGI" so I had expected that at least the "Alarm" is a term
that makes it "more-like-that"-style.

I'd welcome a hint what is going wrong here. Thanks.

Kindly
…Christoph

--
A distributed system is one in which I cannot get something done
because a machine I've never heard of is down. (Leslie Lamport)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Christoph Haas) #3

Am 20.11.2013 02:51, schrieb Randall McRee:

Check out:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

Note that the default values are designed for a large corpus, not a
test example. In particular, you are butting your head against one or
more of (and this is a guess, but I do have a working implementation
of MLT):

percent_terms_to_match
min_term_freq
min_doc_freq

pretty sure you need to set these, not let them default.
Thank you - that was it. I set min_term_freq=1 and received goot
results. My fields are rather short (like "Alarm Clock for GTK
Environments"). So I understand that the default is term_freq=1 which
means that the term "Alarm" would have to occur at least twice before it
would make another document "related".

…Christoph

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4