Elastic Search for misspelled words


(samir.selia) #1

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #2

For phrase queries, try using a match query with fuzziness enabled. It will
create a fuzzy query for each term in your search.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

Cheers,

Ivan

On Wed, Oct 23, 2013 at 8:09 AM, samir.selia@wcities.com wrote:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samir.selia) #3

Hi Ivan,

Thank you for your reply.

I tried match query. For single word fuzzy method returns more appropriate
results.
Is there any way to combine match and fuzzy in single query or other method
to get best possible results for single and multiple words?

Below is match query sample:

$result = $es->search(array(
'query' => array(
'match' => array(
'name' => array(
"query" => 'rihna',
"operator" => "or",
"fuzziness" => 1.0,
"prefix_length" => 1
)
)

            )
        )
    );

On Thursday, October 24, 2013 3:57:13 AM UTC+5:30, Ivan Brusic wrote:

For phrase queries, try using a match query with fuzziness enabled. It
will create a fuzzy query for each term in your search.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html

Cheers,

Ivan

On Wed, Oct 23, 2013 at 8:09 AM, <samir...@wcities.com <javascript:>>wrote:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hendrik) #4

maybe a phonetic search is the right for
you https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samir.selia) #5

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(sina.tamanna) #6

Phonetic search is a plugin. You can install it and the syntax will be
available in PHP or any other language. It is written in java, but that
doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.com wrote:

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com
:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #7

I know this is t what you were looking for but I have to make sure you are aware that the phrase suggester does a really good job of finding misspellings. It'll even find misspelled phrases made I properly spelled words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.tamanna@gmail.com wrote:

Phonetic search is a plugin. You can install it and the syntax will be available in PHP or any other language. It is written in java, but that doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.com wrote:
Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:
maybe a phonetic search is the right for you https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(samir.selia) #8

Thank you all for your suggestions.

Phonetic search worked for me.

But it doesn't return any results for unicode characters.
For e.g.
There is an artist named "Beyoncé" with unicode character é.
For search term Beyoncé (with unicode character) it returns proper results.
But for Beyonce (without unicode character) it returns no results.

Any suggestions will be highly appreciated.

On Thursday, October 24, 2013 6:33:03 PM UTC+5:30, Nikolas Everett wrote:

I know this is t what you were looking for but I have to make sure you are
aware that the phrase suggester does a really good job of finding
misspellings. It'll even find misspelled phrases made I properly spelled
words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.t...@gmail.com <javascript:> wrote:

Phonetic search is a plugin. You can install it and the syntax will be
available in PHP or any other language. It is written in java, but that
doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.comwrote:

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb
samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #9

Did you try to apply an asciifolding filter as well?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 30 oct. 2013 à 06:06, samir.selia@wcities.com a écrit :

Thank you all for your suggestions.

Phonetic search worked for me.

But it doesn't return any results for unicode characters.
For e.g.
There is an artist named "Beyoncé" with unicode character é.
For search term Beyoncé (with unicode character) it returns proper results. But for Beyonce (without unicode character) it returns no results.

Any suggestions will be highly appreciated.

On Thursday, October 24, 2013 6:33:03 PM UTC+5:30, Nikolas Everett wrote:

I know this is t what you were looking for but I have to make sure you are aware that the phrase suggester does a really good job of finding misspellings. It'll even find misspelled phrases made I properly spelled words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.t...@gmail.com wrote:

Phonetic search is a plugin. You can install it and the syntax will be available in PHP or any other language. It is written in java, but that doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.com wrote:
Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:
maybe a phonetic search is the right for you https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #10

Of you use Unicode, you should use ICU folding, which is far more powerful
than ASCII folding

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vallabh-2) #11

Thanks for your suggestions,

asciifolding works for me for unicode character.

But it doesnot works for special character.

for e.g,
There is an artist name !!! (chk chk chk) with three exclamation mark.
when i search chk, then it gives the result, but when i search only !!!
(three exclamation mark) then it gives nothing.

Also there is an artist, ke$ha (having $ in it)
I wanted to search by kesha (without $)

Any suggestions will be appreciated.

On Wednesday, October 30, 2013 10:58:04 AM UTC+5:30, David Pilato wrote:

Did you try to apply an asciifolding filter as well?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 30 oct. 2013 à 06:06, samir...@wcities.com <javascript:> a écrit :

Thank you all for your suggestions.

Phonetic search worked for me.

But it doesn't return any results for unicode characters.
For e.g.
There is an artist named "Beyoncé" with unicode character é.
For search term Beyoncé (with unicode character) it returns proper
results. But for Beyonce (without unicode character) it returns no results.

Any suggestions will be highly appreciated.

On Thursday, October 24, 2013 6:33:03 PM UTC+5:30, Nikolas Everett wrote:

I know this is t what you were looking for but I have to make sure you
are aware that the phrase suggester does a really good job of finding
misspellings. It'll even find misspelled phrases made I properly spelled
words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.t...@gmail.com wrote:

Phonetic search is a plugin. You can install it and the syntax will be
available in PHP or any other language. It is written in java, but that
doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.comwrote:

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
https://github.com/elasticsearch/elasticsearch-analysis-phonetic

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb
samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having
single word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #12

The names you are looking for are name entities. Each entity can have
variant spellings, such as Kesha and Ke$ha.

Libraries solve this challenge by using name authority files. For example,
the entity of Kesha is http://viaf.org/viaf/81878968/ and under this URL,
dereferenced to an URI, you can find the variant names, even at an
international scope.

To detect such name entities, a special token filter would be required.
Advanced ones are name entity recognizer (NER) with a large knowledge base.

The standard tokenizers handle '$' and '!!!' as word delimiting characters.
If it is feasible, you can create synonyms of all the words you want to
treat as an exception, and set up a synonym filter.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vallabh-2) #13

Thanks for the quick suggestions,
I tried this method on my side, but it didn't work for me

curl -X PUT 'http://localhost:9200/admin/?pretty=true' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"artist_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase",
"artist_metaphone", "asciifolding", "synonym"]
}
},
"filter" : {
"artist_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
},
"synonym" : {
"type" : "synonym",
"synonyms" : [
"kesha => ke$ha",
"!!! => !!! (chk chk chk)"
]
}
}
}
}
}
'
Am i doing wrong something.
Any suggestions will be highly appreciated.

On Thursday, October 31, 2013 4:16:54 PM UTC+5:30, Jörg Prante wrote:

The names you are looking for are name entities. Each entity can have
variant spellings, such as Kesha and Ke$ha.

Libraries solve this challenge by using name authority files. For example,
the entity of Kesha is http://viaf.org/viaf/81878968/ and under this URL,
dereferenced to an URI, you can find the variant names, even at an
international scope.

To detect such name entities, a special token filter would be required.
Advanced ones are name entity recognizer (NER) with a large knowledge base.

The standard tokenizers handle '$' and '!!!' as word delimiting
characters. If it is feasible, you can create synonyms of all the words you
want to treat as an exception, and set up a synonym filter.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #14

I have put up an artist demo

Maybe you can get some inspiration from it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vallabh-2) #15

Thanks for the reply,
Synonym method works for me, it gives ke$ha result when search for kesha
and ke$ha.
For synonym i have change "tokenizer" : "standard" to "tokenizer" :
"whitespace" and found different problem.
I do have artist name with jay-z (with hyphen in between), earlier when i
was searching jay z (with whitespace) then it was giving jay-z result but
now this is not happening due to change in tokenizer.
Is there way where i can used 2 tokenizer.
Also is it possible to search exclamation mark (!) in elasticsearch query.

Below is the code,

curl -X PUT 'http://localhost:9200/admin/?pretty=true' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"artist_analyzer" : {
"tokenizer" : "whitespace",
"filter" : ["standard", "lowercase", "synonym",
"artist_metaphone", "asciifolding"]
}
},
"filter" : {
"artist_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
},
"synonym" : {
"type" : "synonym",
"synonyms_path" :
"/var/www/html/elasticsearch-master/synonyms.txt"
}
}
}
}
}
'

echo; echo
echo 'Create the mapping.'
curl -X PUT
'http://localhost:9200/admin/jos_artist_details/_mapping?pretty=true' -d '
{
"jos_artist_details" : {
"properties" : {
"name" : {
"type": "string",
"index_analyzer": "artist_analyzer",
"search_analyzer": "artist_analyzer"
}

}

}
}
'

On Saturday, November 2, 2013 12:31:47 AM UTC+5:30, Jörg Prante wrote:

I have put up an artist demo

https://gist.github.com/jprante/7270193

Maybe you can get some inspiration from it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #16