Problem in search when using custom analyzer


(Robbie) #1

Hi, I have a use case where I want to search using wild-cards on a custom
analyzed field. I am currently seeing some discrepancies w.r.t what I
expect.

Basically, I have string data in a field, such as "Name-55", "Name-56" etc.
I want to be able to search for "Name-5*", and get these results.

I have indexed the data into these terms
"Name", "-", "55"
"Name", "-", "56"

using a custom pattern analyzer to achieve this. I am using a similar
custom pattern analyzer for my query string, except that I am swallowing
any whitespaces, &,? and *.

"my_template" : {
"template" : "",
"order": 1,
"settings" :{
"analysis": {
"analyzer": {
"custom_index":{
"type": "pattern",
"pattern":"([\s]+)|((?<=\p{L})(?=\P{L})|((?<=\P{L})(?=\p{L}))|((?<=\d)(?=\D))|((?<=\D)(?=\d)))"
},
"custom_search":{
"type": "pattern",
"pattern":"([?&
\s]+)|((?<=\p{L})(?=\P{L})|((?<=\P{L})(?=\p{L}))|((?<=\d)(?=\D))|((?<=\D)(?=\d)))"
}
}
}
},
"mappings" : {
"account" : {
"properties" : {
"myfield" : {
"type" : "string",
"store" : "yes",
"index" : "analyzed",
"index_analyzer" :"custom_index",
"search_analyzer":"custom_search"
}}}}}}

Using this, I see that when I search for "Name-5*", I do not get any
results returned.

However, if I search for "Name- 5*" (Note additional white-space in the
search string), then I get the results Name-55 and Name-56.

Do you have an understanding of why elasticsearch may be exhibiting this
behavior? Is there some issue in the way I have setup the patterns in my
analyzer?

Your help is much appreciated, Thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c0a3f237-ea6a-43a7-aefb-c22eb61c75f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hello Robbie ,

I feel you are doing it in the wrong direction.

You can use wild card query -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

But then in the example you have given , this might not work.
This is because , by default the analyzer in ES will tokenize your Name-55
into [ "Name" , 55]
Hence it would be a good idea to store the text as it is , and enable a
normal whitespace tokenizer -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-whitespace-tokenizer.html

After this use the wild card query to search.

Thanks
Vineeth

On Tue, Jul 1, 2014 at 10:00 AM, Robbie pherur@ciphercloud.com wrote:

Hi, I have a use case where I want to search using wild-cards on a
custom analyzed field. I am currently seeing some discrepancies w.r.t what
I expect.

Basically, I have string data in a field, such as "Name-55", "Name-56"
etc. I want to be able to search for "Name-5*", and get these results.

I have indexed the data into these terms
"Name", "-", "55"
"Name", "-", "56"

using a custom pattern analyzer to achieve this. I am using a similar
custom pattern analyzer for my query string, except that I am swallowing
any whitespaces, &,? and *.

"my_template" : {
"template" : "",
"order": 1,
"settings" :{
"analysis": {
"analyzer": {
"custom_index":{
"type": "pattern",
"pattern":"([\s]+)|((?<=\p{L})(?=\P{L})|((?<=\P{L})(?=
\p{L}))|((?<=\d)(?=\D))|((?<=\D)(?=\d)))"
},
"custom_search":{
"type": "pattern",
"pattern":"([?&
\s]+)|((?<=\p{L})(?=\P{L})|((?<=\P{L})(?
=\p{L}))|((?<=\d)(?=\D))|((?<=\D)(?=\d)))"
}
}
}
},
"mappings" : {
"account" : {
"properties" : {
"myfield" : {
"type" : "string",
"store" : "yes",
"index" : "analyzed",
"index_analyzer" :"custom_index",
"search_analyzer":"custom_search"
}}}}}}

Using this, I see that when I search for "Name-5*", I do not get any
results returned.

However, if I search for "Name- 5*" (Note additional white-space in the
search string), then I get the results Name-55 and Name-56.

Do you have an understanding of why elasticsearch may be exhibiting this
behavior? Is there some issue in the way I have setup the patterns in my
analyzer?

Your help is much appreciated, Thanks!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c0a3f237-ea6a-43a7-aefb-c22eb61c75f4%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c0a3f237-ea6a-43a7-aefb-c22eb61c75f4%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DRUdG64JQVN-TArHCi%2BdE-p5SEzLjaJKKGN53zhE5pAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3