How to get results for missspelled query not using fuzzy based query?


(Michał Orzechowski) #1

Hi,

I am new to ElasticSearch and I am trying to handel such a case:

I have indexed a products:

curl -XPUT 'http://localhost:9200/shop/cameras/1' -d '
{
"product_name": "Digital SLR Camera Nikon D3100 14.2MP"
}
'

curl -XPUT 'http://localhost:9200/shop/cameras/2' -d '
{
"product_name": "Camera Nikon D60 10.2MP"
}
'

and I am trying to get them when the query is misspelled. It works quite
fine for fuzzy based query like:

curl -XGET 'http://localhost:9200/shop/cameras/_search' -d '
{
"query" : {
"fuzzy" : {
"product_name" : {
"value" : "nikko",
"min_similarity" : 0.5
}
}
}
}
'
However in docs for fuzzy based query there is a warrning that this
solution is not scalable. Is there another way to get results for
misspeled query
not using fuzzy based queries?

Thanks in advance.
Michal


(Karussell) #2

You could try to create your own analyzer (or take one from Solr ;))
Either via n-grams or 'phonetic terms' - terms that sounds equally
will get the same transformed term: see http://en.wikipedia.org/wiki/Soundex
etc

Another start could be to look at Solr's spellchecking mechanism and
copy them to use the (analyzer) into ES :wink:

http://wiki.apache.org/solr/SpellCheckComponent

On the other side I would try fuzzy or at least if no results were
returned query the index via fuzzy (in background) ...

Regards,
Peter.

On 3 Feb., 14:43, Michał Orzechowski michal.orzechow...@nokaut.pl
wrote:

Hi,

I am new to ElasticSearch and I am trying to handel such a case:

I have indexed a products:

curl -XPUT 'http://localhost:9200/shop/cameras/1'-d '
{
"product_name": "Digital SLR Camera Nikon D3100 14.2MP"}

'

curl -XPUT 'http://localhost:9200/shop/cameras/2'-d '
{
"product_name": "Camera Nikon D60 10.2MP"}

'

and I am trying to get them when the query is misspelled. It works quite
fine for fuzzy based query like:

curl -XGET 'http://localhost:9200/shop/cameras/_search'-d '
{
"query" : {
"fuzzy" : {
"product_name" : {
"value" : "nikko",
"min_similarity" : 0.5
}
}
}}

'
However in docs for fuzzy based query there is a warrning that this
solution is not scalable. Is there another way to get results for
misspeled query
not using fuzzy based queries?

Thanks in advance.
Michal


(Karussell) #3

also take a look here:

http://elasticsearch-users.115913.n3.nabble.com/Terms-API-for-Spellchecker-td1691838.html

On 5 Feb., 00:50, Karussell tableyourt...@googlemail.com wrote:

You could try to create your own analyzer (or take one from Solr ;))
Either via n-grams or 'phonetic terms' - terms that sounds equally
will get the same transformed term: seehttp://en.wikipedia.org/wiki/Soundex
etc

Another start could be to look at Solr's spellchecking mechanism and
copy them to use the (analyzer) into ES :wink:

http://wiki.apache.org/solr/SpellCheckComponent

On the other side I would try fuzzy or at least if no results were
returned query the index via fuzzy (in background) ...

Regards,
Peter.

On 3 Feb., 14:43, Michał Orzechowski michal.orzechow...@nokaut.pl
wrote:

Hi,

I am new to ElasticSearch and I am trying to handel such a case:

I have indexed a products:

curl -XPUT 'http://localhost:9200/shop/cameras/1'-d'
{
"product_name": "Digital SLR Camera Nikon D3100 14.2MP"}

'

curl -XPUT 'http://localhost:9200/shop/cameras/2'-d'
{
"product_name": "Camera Nikon D60 10.2MP"}

'

and I am trying to get them when the query is misspelled. It works quite
fine for fuzzy based query like:

curl -XGET 'http://localhost:9200/shop/cameras/_search'-d'
{
"query" : {
"fuzzy" : {
"product_name" : {
"value" : "nikko",
"min_similarity" : 0.5
}
}
}}

'
However in docs for fuzzy based query there is a warrning that this
solution is not scalable. Is there another way to get results for
misspeled query
not using fuzzy based queries?

Thanks in advance.
Michal


(Michał Orzechowski) #4

Thanks for help! I am going to look into those Solr components.

On 5 Lut, 00:51, Karussell tableyourt...@googlemail.com wrote:

also take a look here:

http://elasticsearch-users.115913.n3.nabble.com/Terms-API-for-Spellch...

On 5 Feb., 00:50, Karussell tableyourt...@googlemail.com wrote:

You could try to create your own analyzer (or take one from Solr ;))
Either via n-grams or 'phonetic terms' - terms that sounds equally
will get the same transformed term: seehttp://en.wikipedia.org/wiki/Soundex
etc

Another start could be to look at Solr's spellchecking mechanism and
copy them to use the (analyzer) into ES :wink:

http://wiki.apache.org/solr/SpellCheckComponent

On the other side I would try fuzzy or at least if no results were
returned query the index via fuzzy (in background) ...

Regards,
Peter.

On 3 Feb., 14:43, Michał Orzechowski michal.orzechow...@nokaut.pl
wrote:

Hi,

I am new to ElasticSearch and I am trying to handel such a case:

I have indexed a products:

curl -XPUT 'http://localhost:9200/shop/cameras/1'-d'
{
"product_name": "Digital SLR Camera Nikon D3100 14.2MP"}

'

curl -XPUT 'http://localhost:9200/shop/cameras/2'-d'
{
"product_name": "Camera Nikon D60 10.2MP"}

'

and I am trying to get them when the query is misspelled. It works quite
fine for fuzzy based query like:

curl -XGET 'http://localhost:9200/shop/cameras/_search'-d'
{
"query" : {
"fuzzy" : {
"product_name" : {
"value" : "nikko",
"min_similarity" : 0.5
}
}
}}

'
However in docs for fuzzy based query there is a warrning that this
solution is not scalable. Is there another way to get results for
misspeled query
not using fuzzy based queries?

Thanks in advance.
Michal


(Shay Banon) #5

Those analyzers are already provided in ES (soundex and ngram). Regarding the spell check component, it is problematic since it requires another index to be built alongside the original index, which gets really complicated when it comes to distributed system.

The reason for not tackling this currently is that there is a really cool work done in lucene trunk (upcoming 4.0) that will provide spell check like functionality while working on the original index.
On Tuesday, February 8, 2011 at 2:53 PM, Michał Orzechowski wrote:

Thanks for help! I am going to look into those Solr components.

On 5 Lut, 00:51, Karussell tableyourt...@googlemail.com wrote:

also take a look here:

http://elasticsearch-users.115913.n3.nabble.com/Terms-API-for-Spellch...

On 5 Feb., 00:50, Karussell tableyourt...@googlemail.com wrote:

You could try to create your own analyzer (or take one from Solr ;))
Either via n-grams or 'phonetic terms' - terms that sounds equally
will get the same transformed term: seehttp://en.wikipedia.org/wiki/Soundex
etc

Another start could be to look at Solr's spellchecking mechanism and
copy them to use the (analyzer) into ES :wink:

http://wiki.apache.org/solr/SpellCheckComponent

On the other side I would try fuzzy or at least if no results were
returned query the index via fuzzy (in background) ...

Regards,
Peter.

On 3 Feb., 14:43, Michał Orzechowski michal.orzechow...@nokaut.pl
wrote:

Hi,

I am new to ElasticSearch and I am trying to handel such a case:

I have indexed a products:

curl -XPUT 'http://localhost:9200/shop/cameras/1'-d'
{
"product_name": "Digital SLR Camera Nikon D3100 14.2MP"}

'

curl -XPUT 'http://localhost:9200/shop/cameras/2'-d'
{
"product_name": "Camera Nikon D60 10.2MP"}

'

and I am trying to get them when the query is misspelled. It works quite
fine for fuzzy based query like:

curl -XGET 'http://localhost:9200/shop/cameras/_search'-d'
{
"query" : {
"fuzzy" : {
"product_name" : {
"value" : "nikko",
"min_similarity" : 0.5
}
}
}}

'
However in docs for fuzzy based query there is a warrning that this
solution is not scalable. Is there another way to get results for
misspeled query
not using fuzzy based queries?

Thanks in advance.
Michal


(srrin) #6

Hi Shay,
Is there any example on how to use this phonetic analyzers and search documents?
This may help to implement one of my clients request.


(Shay Banon) #7

This page has an example of how to set it up: http://www.elasticsearch.org/guide/reference/index-modules/analysis/phonetic-tokenfilter.html. Then, you can reference the constructed analyzer by name in your mappings (where it applies).
On Friday, April 22, 2011 at 8:53 AM, srrIN wrote:

Hi Shay,
Is there any example on how to use this phonetic analyzers and search
documents?
This may help to implement one of my clients request.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/How-to-get-results-for-missspelled-query-not-using-fuzzy-based-query-tp2413584p2850491.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(srrin) #8

Thank you, will check with this and get back to you for any clarifications.

  • SRR

(system) #9