Disable custom analyzer for prefix filter


(George Sakkis) #1

Hi all,

I have defined a custom default analyzer that works fine for "text"
and "string_query" type queries but is problematic for prefix filters.
Here is a simpler example using the "english" analyzer:

curl -XPUT 'http://localhost:9200/twitter' -d '{"analysis":
{"analyzer": {"default": {"type": "english"}}}}'
curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{"title": "User
setting"}'
curl -XPOST http://localhost:9200/twitter/_refresh

the following match; good

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"text": {"_all": "setting"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"text": {"_all": "settings"}}'

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "s"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "se"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "set"}}'

the following don't match; not good

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "sett"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "setting"}}'

Is there a way to make it work as desired without a second index?

Thanks,
George


(simonw-2) #2

On Thursday, July 19, 2012 5:01:34 PM UTC+2, George Sakkis wrote:

Hi all,

I have defined a custom default analyzer that works fine for "text"
and "string_query" type queries but is problematic for prefix filters.
Here is a simpler example using the "english" analyzer:

curl -XPUT 'http://localhost:9200/twitter' -d '{"analysis":
{"analyzer": {"default": {"type": "english"}}}}'
curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{"title": "User
setting"}'
curl -XPOST http://localhost:9200/twitter/_refresh

the following match; good

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"text": {"_all": "setting"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"text": {"_all": "settings"}}'

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "s"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "se"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "set"}}'

the following don't match; not good

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "sett"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "setting"}}'

Is there a way to make it work as desired without a second index?

the english analyzer uses a stemmer that stems "setting" to set. If you use
prefix query no analysis is applied that is why sett and setting
respectively doesn't match anything. I'd actually argue that you need
prefix matching if you are already stemming. Yet, in your example you might
not want to stemm at all but I know almost nothing about your search
scenario. Can you provide more infos about waht you are trying to do?

simon

Thanks,
George


(George Sakkis) #3

On Friday, July 20, 2012 5:44:56 PM UTC+2, simonw wrote:

On Thursday, July 19, 2012 5:01:34 PM UTC+2, George Sakkis wrote:

Hi all,

I have defined a custom default analyzer that works fine for "text"
and "string_query" type queries but is problematic for prefix filters.
Here is a simpler example using the "english" analyzer:

curl -XPUT 'http://localhost:9200/twitter' -d '{"analysis":
{"analyzer": {"default": {"type": "english"}}}}'
curl -XPUT http://localhost:9200/twitter/tweet/1 -d '{"title": "User
setting"}'
curl -XPOST http://localhost:9200/twitter/_refresh

the following match; good

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"text": {"_all": "setting"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"text": {"_all": "settings"}}'

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "s"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "se"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "set"}}'

the following don't match; not good

curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "sett"}}'
curl -s -XGET 'http://localhost:9200/twitter/tweet/_count?pretty=true'
-d '{"prefix": {"_all": "setting"}}'

Is there a way to make it work as desired without a second index?

the english analyzer uses a stemmer that stems "setting" to set. If you
use prefix query no analysis is applied that is why sett and setting
respectively doesn't match anything. I'd actually argue that you need
prefix matching if you are already stemming. Yet, in your example you might
not want to stemm at all but I know almost nothing about your search
scenario. Can you provide more infos about waht you are trying to do?

simon

Basically there are two scenarios. Prefix filtering is used for updating a
live list of matches as a user is typing in a search box one character at a
time. In this scenario I don't want any analyzer to be applied as the query
is not fully formed yet. In the other scenario the query is finalized and I
want to use a custom analyzer for stemming, stoplisting and whatnot.

George


(system) #4