Fuzzy


(kalpana pinnaka) #1

Hi, I want a command to get the "nearest word" of a miss-spell word
using fuzzy query.

for example: i have put a command like this

curl -XPUT http://localhost:9200/lowang/ets/11 -d '
{
"_id":11,
"title":"smartfon nokia 5678"
}

and i use a search command like below

curl -XGET http://localhost:9200/lowang/ets/_search -d '{
"query": {
"bool": {
"must":[
{ "fuzzy": { "title": { "value" : "noia" } } }
]
}
}}'

and now i got the results like below

{"took":35,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":1,"max_score":0.15342641,"hits":
[{"_index":"lowang","_type":"ets","_id":"11","_score":0.15342641,
"_source" :
{
"_id":11,
"title":"smartfon nokia 5678"
}}]}}

internally...using fuzzy query logic ..it is calculating nearest word
for "noia" as "nokia"..and it giving results for the word "nokia".

Instead of results i want to display the "nearest word" of a miss-
spelled word.

How to i can get?


(Ævar Arnfjörð Bjarmason) #2

On Wed, Sep 21, 2011 at 07:29, kalpana pinnaka saikalpana18@gmail.com wrote:

internally...using fuzzy query logic ..it is calculating nearest word
for "noia" as "nokia"..and it giving results for the word "nokia".

Instead of results i want to display the "nearest word" of a miss-
spelled word.

I dealt with this problem today and just wrote code that for a given
resultset for a fuzzy result:

  • Got the first result in the set
  • Tokenized the words in it
  • Tokenized the words I'd given ElasticSearch
  • Compared the Levenshtein distance between all words and took into
    account their length
  • Got words like "nokia" back for "noia"

I wish there was an easier API for this, but I haven't found it yet.


(system) #3