Elastic Search for misspelled words

samir_selia · October 23, 2013, 3:09pm

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · October 23, 2013, 10:27pm

For phrase queries, try using a match query with fuzziness enabled. It will
create a fuzzy query for each term in your search.

Cheers,

Ivan

On Wed, Oct 23, 2013 at 8:09 AM, samir.selia@wcities.com wrote:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

samir_selia · October 24, 2013, 5:30am

Hi Ivan,

Thank you for your reply.

I tried match query. For single word fuzzy method returns more appropriate
results.
Is there any way to combine match and fuzzy in single query or other method
to get best possible results for single and multiple words?

Below is match query sample:

$result = $es->search(array(
'query' => array(
'match' => array(
'name' => array(
"query" => 'rihna',
"operator" => "or",
"fuzziness" => 1.0,
"prefix_length" => 1
)
)

            )
        )
    );

On Thursday, October 24, 2013 3:57:13 AM UTC+5:30, Ivan Brusic wrote:

For phrase queries, try using a match query with fuzziness enabled. It
will create a fuzzy query for each term in your search.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,

Ivan

On Wed, Oct 23, 2013 at 8:09 AM, <samir...@wcities.com <javascript:>>wrote:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hendrik · October 24, 2013, 5:40am

maybe a phonetic search is the right for
you GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

samir_selia · October 24, 2013, 6:39am

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

sina_tamanna · October 24, 2013, 7:13am

Phonetic search is a plugin. You can install it and the syntax will be
available in PHP or any other language. It is written in java, but that
doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.com wrote:

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com
:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

nik9000 · October 24, 2013, 1:03pm

I know this is t what you were looking for but I have to make sure you are aware that the phrase suggester does a really good job of finding misspellings. It'll even find misspelled phrases made I properly spelled words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.tamanna@gmail.com wrote:

Phonetic search is a plugin. You can install it and the syntax will be available in PHP or any other language. It is written in java, but that doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.com wrote:
Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:
maybe a phonetic search is the right for you GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

samir_selia · October 30, 2013, 5:06am

Thank you all for your suggestions.

Phonetic search worked for me.

But it doesn't return any results for unicode characters.
For e.g.
There is an artist named "Beyoncé" with unicode character é.
For search term Beyoncé (with unicode character) it returns proper results.
But for Beyonce (without unicode character) it returns no results.

Any suggestions will be highly appreciated.

On Thursday, October 24, 2013 6:33:03 PM UTC+5:30, Nikolas Everett wrote:

I know this is t what you were looking for but I have to make sure you are
aware that the phrase suggester does a really good job of finding
misspellings. It'll even find misspelled phrases made I properly spelled
words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.t...@gmail.com <javascript:> wrote:

Phonetic search is a plugin. You can install it and the syntax will be
available in PHP or any other language. It is written in java, but that
doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.comwrote:

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb
samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single
word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · October 30, 2013, 5:28am

Did you try to apply an asciifolding filter as well?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 30 oct. 2013 à 06:06, samir.selia@wcities.com a écrit :

Thank you all for your suggestions.

Phonetic search worked for me.

But it doesn't return any results for unicode characters.
For e.g.
There is an artist named "Beyoncé" with unicode character é.
For search term Beyoncé (with unicode character) it returns proper results. But for Beyonce (without unicode character) it returns no results.

Any suggestions will be highly appreciated.

On Thursday, October 24, 2013 6:33:03 PM UTC+5:30, Nikolas Everett wrote:

I know this is t what you were looking for but I have to make sure you are aware that the phrase suggester does a really good job of finding misspellings. It'll even find misspelled phrases made I properly spelled words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.t...@gmail.com wrote:

Phonetic search is a plugin. You can install it and the syntax will be available in PHP or any other language. It is written in java, but that doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.com wrote:
Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:
maybe a phonetic search is the right for you GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having single word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 30, 2013, 8:07am

Of you use Unicode, you should use ICU folding, which is far more powerful
than ASCII folding

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vallabh_2 · October 31, 2013, 10:26am

Thanks for your suggestions,

asciifolding works for me for unicode character.

But it doesnot works for special character.

for e.g,
There is an artist name !!! (chk chk chk) with three exclamation mark.
when i search chk, then it gives the result, but when i search only !!!
(three exclamation mark) then it gives nothing.

Also there is an artist, ke$ha (having $ in it)
I wanted to search by kesha (without $)

Any suggestions will be appreciated.

On Wednesday, October 30, 2013 10:58:04 AM UTC+5:30, David Pilato wrote:

Did you try to apply an asciifolding filter as well?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 30 oct. 2013 à 06:06, samir...@wcities.com <javascript:> a écrit :

Thank you all for your suggestions.

Phonetic search worked for me.

But it doesn't return any results for unicode characters.
For e.g.
There is an artist named "Beyoncé" with unicode character é.
For search term Beyoncé (with unicode character) it returns proper
results. But for Beyonce (without unicode character) it returns no results.

Any suggestions will be highly appreciated.

On Thursday, October 24, 2013 6:33:03 PM UTC+5:30, Nikolas Everett wrote:

I know this is t what you were looking for but I have to make sure you
are aware that the phrase suggester does a really good job of finding
misspellings. It'll even find misspelled phrases made I properly spelled
words. It has it's caveats but might be worth looking into.
Sent from my iPhone

On Oct 24, 2013, at 3:13 AM, sina.t...@gmail.com wrote:

Phonetic search is a plugin. You can install it and the syntax will be
available in PHP or any other language. It is written in java, but that
doesn't limit its usage to java.

On Thursday, October 24, 2013 8:39:16 AM UTC+2, samir...@wcities.comwrote:

Hi Hendrik,

Thank you for your reply.

Phonetic search is in java.
I am looking for a PHP library.

On Thursday, October 24, 2013 11:10:41 AM UTC+5:30, Hendrik wrote:

maybe a phonetic search is the right for you
GitHub - elastic/elasticsearch-analysis-phonetic: Phonetic Analysis Plugin for Elasticsearch

Am Mittwoch, 23. Oktober 2013 17:09:06 UTC+2 schrieb
samir...@wcities.com:

Dear All,

I want to display best possible results for misspelled search terms

I tried using fuzzy method. It works well for search term having
single word. For multiple words it doesn't return any result.

Below is the sample code.

$result = $es->search(array(
'query' => array(
"fuzzy" => array(
"name" => array(
"value" => "jay z",
"boost" => 1.0,
"min_similarity" => 0.5,
"prefix_length" => 1
)
)
),
"from" => $start,
"size" => $limit
)
);

Thanks,
Samir

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · October 31, 2013, 10:46am

The names you are looking for are name entities. Each entity can have
variant spellings, such as Kesha and Ke$ha.

Libraries solve this challenge by using name authority files. For example,
the entity of Kesha is http://viaf.org/viaf/81878968/ and under this URL,
dereferenced to an URI, you can find the variant names, even at an
international scope.

To detect such name entities, a special token filter would be required.
Advanced ones are name entity recognizer (NER) with a large knowledge base.

The standard tokenizers handle '$' and '!!!' as word delimiting characters.
If it is feasible, you can create synonyms of all the words you want to
treat as an exception, and set up a synonym filter.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vallabh_2 · October 31, 2013, 1:55pm

Thanks for the quick suggestions,
I tried this method on my side, but it didn't work for me

curl -X PUT 'http://localhost:9200/admin/?pretty=true' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"artist_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase",
"artist_metaphone", "asciifolding", "synonym"]
}
},
"filter" : {
"artist_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
},
"synonym" : {
"type" : "synonym",
"synonyms" : [
"kesha => ke$ha",
"!!! => !!! (chk chk chk)"
]
}
}
}
}
}
'
Am i doing wrong something.
Any suggestions will be highly appreciated.

On Thursday, October 31, 2013 4:16:54 PM UTC+5:30, Jörg Prante wrote:

The names you are looking for are name entities. Each entity can have
variant spellings, such as Kesha and Ke$ha.

Libraries solve this challenge by using name authority files. For example,
the entity of Kesha is 81878968 and under this URL,
dereferenced to an URI, you can find the variant names, even at an
international scope.

To detect such name entities, a special token filter would be required.
Advanced ones are name entity recognizer (NER) with a large knowledge base.

The standard tokenizers handle '$' and '!!!' as word delimiting
characters. If it is feasible, you can create synonyms of all the words you
want to treat as an exception, and set up a synonym filter.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · November 1, 2013, 7:01pm

I have put up an artist demo

gist.github.com

https://gist.github.com/jprante/7270193

artist.sh

curl -XDELETE 'localhost:9200/test' 

rm /tmp/synonyms.txt
echo "kesha, ke\$ha" >> /tmp/synonyms.txt
echo "chk chk chk, !!!" >> /tmp/synonyms.txt

curl -XPUT 'localhost:9200/test' -d '
{
    "settings" : {
        "analysis" : {

This file has been truncated. show original

Maybe you can get some inspiration from it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vallabh_2 · November 6, 2013, 5:58am

Thanks for the reply,
Synonym method works for me, it gives ke$ha result when search for kesha
and ke$ha.
For synonym i have change "tokenizer" : "standard" to "tokenizer" :
"whitespace" and found different problem.
I do have artist name with jay-z (with hyphen in between), earlier when i
was searching jay z (with whitespace) then it was giving jay-z result but
now this is not happening due to change in tokenizer.
Is there way where i can used 2 tokenizer.
Also is it possible to search exclamation mark (!) in elasticsearch query.

Below is the code,

curl -X PUT 'http://localhost:9200/admin/?pretty=true' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"artist_analyzer" : {
"tokenizer" : "whitespace",
"filter" : ["standard", "lowercase", "synonym",
"artist_metaphone", "asciifolding"]
}
},
"filter" : {
"artist_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
},
"synonym" : {
"type" : "synonym",
"synonyms_path" :
"/var/www/html/elasticsearch-master/synonyms.txt"
}
}
}
}
}
'

echo; echo
echo 'Create the mapping.'
curl -X PUT
'http://localhost:9200/admin/jos_artist_details/_mapping?pretty=true' -d '
{
"jos_artist_details" : {
"properties" : {
"name" : {
"type": "string",
"index_analyzer": "artist_analyzer",
"search_analyzer": "artist_analyzer"
}

}
}
'

On Saturday, November 2, 2013 12:31:47 AM UTC+5:30, Jörg Prante wrote:

I have put up an artist demo

Demo of artist synonyms with special character escaping · GitHub

Maybe you can get some inspiration from it.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Match_phrase_prefix ... fuzzy not working on multi-word terms Elasticsearch	1	555	July 6, 2017
How to get results for missspelled query not using fuzzy based query? Elasticsearch	8	457	July 6, 2017
Fuzzy Elasticsearch	2	249	July 6, 2017
Fuzzy query Elasticsearch	3	340	July 6, 2017
Need some help for fuzzysearch in es Elasticsearch	3	654	July 5, 2017

Elastic Search for misspelled words

Related topics