Phrase suggester and non-existing terms


(Christoph Haas) #1

Dear list,

I'm about to implement a full-text search for a web site and am a bit
stuck on a "did you mean"-like functionality. If no results were found I
would like to point the visitor to more promising search queries. So
what I'm doing:

curl -s -XPOST 'torf:9200/debshots/jdbc/_search?search_type=count' -d '{
"suggest" : {
"text" : "editar zaphodbeeblebrox",
"simple_phrase" : {
"phrase" : {
"analyzer" : "simple",
"field" : "description",
"size" : 4,
"real_word_error_likelihood" : 0.95,
"confidence" : 2.0,
"gram_size" : 1,
"highlight": {
"pre_tag": "",
"post_tag": "
"
},
"direct_generator" : [{
"field" : "description",
"suggest_mode" : "missing",
"min_doc_freq" : 1,
"min_word_len" : 1
}]
}
}
}
}' | json_pp

I would expect that I get "editor" back because "editar" was a typo and
"zaphodbeeblebrox" does not exist anywhere in my index. But what I get
instead is:

{
"hits" : {
"hits" : [],
"max_score" : 0,
"total" : 43682
},
"timed_out" : false,
"suggest" : {
"simple_phrase" : [
{
"length" : 23,
"options" : [
{
"text" : "editor zaphodbeeblebrox",
"score" : 0.00022499196,
"highlighted" : "editor zaphodbeeblebrox"
},
{
"text" : "edit zaphodbeeblebrox",
"score" : 6.074264e-05,
"highlighted" : "edit zaphodbeeblebrox"
},
{
"text" : "editors zaphodbeeblebrox",
"score" : 4.8506903e-05,
"highlighted" : "editors zaphodbeeblebrox"
}
],
"text" : "editar zaphodbeeblebrox",
"offset" : 0
}
]
},
"_shards" : {
"failed" : 0,
"successful" : 1,
"total" : 1
},
"took" : 16
}

The problem is that if a user would start to search for "editors
zaphodbeeblebrox" as suggest he would find no documents.

So my question is: how can I change the search / phrase-suggest query to
remove words that are not found anyway? I was tempted to take the
highlighted suggestions and just keep the words between the
tags but that feels dirty and I'm sure there is a better way.

Thanks in advance for your help.

…Christoph

--
A distributed system is one in which I cannot get something done
because a machine I've never heard of is down. (Leslie Lamport)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5297B644.9020203%40christoph-haas.de.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #2