Completion suggestions with strings containing only numbers and strings prefixed with numbers (e.g. addresses) not working


(Mahesh Kommareddi) #1

Hi,
I'm using and testing against 0.90.5 (three nodes and five shards) and
0.90.7 (one node and five shards). I'm interested in the completion
suggester and it works great for items where letters are involved. For
example, first names, last names, states, cities, or product names all
return suggestions. This is with the help of the documentation that is on
the Elasticsearch website:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

As an example, I created a 'numbers' index with a type named 'zipcode'. The
stored document contains a field that is a string:

I receive: {"ok":true,"acknowledged":true}

Next, I insert some zipcodes as strings:

I receive:
{"ok":true,"_index":"numbers","_type":"zipcode","_id":"1","_version":1}
{"ok":true,"_index":"numbers","_type":"zipcode","_id":"2","_version":1}

Now the suggestion:

I expect the "90210" string to be returned. However, I get no result:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"numbers-suggest" : [ {
"text" : "902",
"offset" : 0,
"length" : 3,
"options" : [ ]
} ]
}

Just to be sure, I insert an item that contains alphabetical items only and
query for a suggestion:

Which returns a result as expected:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"numbers-suggest" : [ {
"text" : "bl",
"offset" : 0,
"length" : 2,
"options" : [ {
"text" : "Blah blah",
"score" : 1.0, "payload" : {"area code":0}
} ]
} ]
}

It appears that zipcodes ("90210", "29268", "28262"), phone numbers
("7042315555", "5555555555", 9999999999"), and addresses starting with
numbers ("123 Rose Ln", "111 White St", etc) are not suggested even though
all these items are stored as strings. Is this a limitation? If so, is
there a way around it?

Thanks,
Mahesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Alexander Reelsen) #2

Hey,

try running the analyze API for numbers using the simple analyzer and
compare it with the output of the standard analyzer

» curl 'localhost:9200/_analyze?analyzer=standard' -d '12345'
{"tokens":[{"token":"12345","start_offset":0,"end_offset":5,"type":"","position":1}]}
» curl 'localhost:9200/_analyze?analyzer=simple' -d '12345'
{"tokens":[]}

as you can see, the simple analyzer actually removed the numbers.

So most likely the simple analyzer is not useful to return numbers in your
case and you should stick with another analyzer, as the simple analyzer
removes any numbers - which explains you not getting any results.

--Alex

On Thu, Nov 21, 2013 at 8:07 PM, Mahesh Kommareddi <
mahesh.kommareddi@gmail.com> wrote:

Hi,
I'm using and testing against 0.90.5 (three nodes and five shards) and
0.90.7 (one node and five shards). I'm interested in the completion
suggester and it works great for items where letters are involved. For
example, first names, last names, states, cities, or product names all
return suggestions. This is with the help of the documentation that is on
the Elasticsearch website:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

As an example, I created a 'numbers' index with a type named 'zipcode'.
The stored document contains a field that is a string:
https://gist.github.com/mkommar/7586804

I receive: {"ok":true,"acknowledged":true}

Next, I insert some zipcodes as strings:
https://gist.github.com/mkommar/7586857

I receive:
{"ok":true,"_index":"numbers","_type":"zipcode","_id":"1","_version":1}
{"ok":true,"_index":"numbers","_type":"zipcode","_id":"2","_version":1}

Now the suggestion:
https://gist.github.com/mkommar/7587131

I expect the "90210" string to be returned. However, I get no result:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"numbers-suggest" : [ {
"text" : "902",
"offset" : 0,
"length" : 3,
"options" : [ ]
} ]
}

Just to be sure, I insert an item that contains alphabetical items only
and query for a suggestion:
https://gist.github.com/mkommar/7587188

Which returns a result as expected:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"numbers-suggest" : [ {
"text" : "bl",
"offset" : 0,
"length" : 2,
"options" : [ {
"text" : "Blah blah",
"score" : 1.0, "payload" : {"area code":0}
} ]
} ]
}

It appears that zipcodes ("90210", "29268", "28262"), phone numbers ("
7042315555", "5555555555", 9999999999"), and addresses starting with
numbers ("123 Rose Ln", "111 White St", etc) are not suggested even though
all these items are stored as strings. Is this a limitation? If so, is
there a way around it?

Thanks,
Mahesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mahesh Kommareddi) #3

Thanks a million!!

As you mentioned (for the benefit of others):

The standard analyzer " is built using the Standard Tokenizerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-tokenizer.htmlwith the Standard
Token Filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-tokenfilter.html,
Lower Case Token Filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenfilter.html,
and Stop Token Filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html
."

The simple analyzer "is built using a Lower Case Tokenizerhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenizer.html."
Looking up the Lower Case Tokenizer in the docs says:
"It divides text at non-letters and converts them to lower case. "

Tokenizer Docs
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-analyzers.html

Standard Analyzer Doc
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html

Simple Analyzer Doc
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html

Lower Case Tokenizer Doc
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-lowercase-tokenizer.html

The updated mapping using a standard analyzer is here:

I really appreciate it.

Mahesh

On Friday, November 22, 2013 3:24:39 AM UTC-5, Alexander Reelsen wrote:

Hey,

try running the analyze API for numbers using the simple analyzer and
compare it with the output of the standard analyzer

» curl 'localhost:9200/_analyze?analyzer=standard' -d '12345'

{"tokens":[{"token":"12345","start_offset":0,"end_offset":5,"type":"","position":1}]}
» curl 'localhost:9200/_analyze?analyzer=simple' -d '12345'
{"tokens":[]}

as you can see, the simple analyzer actually removed the numbers.

So most likely the simple analyzer is not useful to return numbers in your
case and you should stick with another analyzer, as the simple analyzer
removes any numbers - which explains you not getting any results.

--Alex

On Thu, Nov 21, 2013 at 8:07 PM, Mahesh Kommareddi <mahesh.k...@gmail.com<javascript:>

wrote:

Hi,
I'm using and testing against 0.90.5 (three nodes and five shards) and
0.90.7 (one node and five shards). I'm interested in the completion
suggester and it works great for items where letters are involved. For
example, first names, last names, states, cities, or product names all
return suggestions. This is with the help of the documentation that is on
the Elasticsearch website:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.htmlhttp://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fguide%2Fen%2Felasticsearch%2Freference%2Fcurrent%2Fsearch-suggesters-completion.html&sa=D&sntz=1&usg=AFQjCNFDmSngwsefS5p-Kj8yUre86kLvDw

As an example, I created a 'numbers' index with a type named 'zipcode'.
The stored document contains a field that is a string:
https://gist.github.com/mkommar/7586804https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fmkommar%2F7586804&sa=D&sntz=1&usg=AFQjCNG68i6AAACvAEeHVa_XB5SGA0PVTQ

I receive: {"ok":true,"acknowledged":true}

Next, I insert some zipcodes as strings:
https://gist.github.com/mkommar/7586857https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fmkommar%2F7586857&sa=D&sntz=1&usg=AFQjCNG9CYS_hXUQgmxfEIw_SObAxXr84A

I receive:
{"ok":true,"_index":"numbers","_type":"zipcode","_id":"1","_version":1}
{"ok":true,"_index":"numbers","_type":"zipcode","_id":"2","_version":1}

Now the suggestion:
https://gist.github.com/mkommar/7587131https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fmkommar%2F7587131&sa=D&sntz=1&usg=AFQjCNGtgvq8MMDLSrjo4ZOye7y0MVLkqA

I expect the "90210" string to be returned. However, I get no result:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"numbers-suggest" : [ {
"text" : "902",
"offset" : 0,
"length" : 3,
"options" : [ ]
} ]
}

Just to be sure, I insert an item that contains alphabetical items only
and query for a suggestion:
https://gist.github.com/mkommar/7587188https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fmkommar%2F7587188&sa=D&sntz=1&usg=AFQjCNGBhUvjacfd1rlwgYNRN3C8fsqIlQ

Which returns a result as expected:
{
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"numbers-suggest" : [ {
"text" : "bl",
"offset" : 0,
"length" : 2,
"options" : [ {
"text" : "Blah blah",
"score" : 1.0, "payload" : {"area code":0}
} ]
} ]
}

It appears that zipcodes ("90210", "29268", "28262"), phone numbers ("
7042315555", "5555555555", 9999999999"), and addresses starting with
numbers ("123 Rose Ln", "111 White St", etc) are not suggested even though
all these items are stored as strings. Is this a limitation? If so, is
there a way around it?

Thanks,
Mahesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4