Search and index analyzer not working as expected


(James Wilson) #1

Short description: searching by the stem word works ("meet") but using
the word in the source document ("meeting") doesn't. I'm missing
something here that's probably fundamental, but it's not explained
anywhere.

I thought a query of "meeting" would be stemmed to "meet" and then it
would find a document that was analyzed the same way at index time.

elasticsearch.yml:

index:
analysis:
analyzer:
my_analyzer: # duplicate the snowball analyzer
type: custom
tokenizer: standard
filter: [standard, lowercase, stop, snowball]

Setup:

curl -XPUT 'http://localhost:9200/twitter'
curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{"tweet" : { "search_analyzer": "my_analyzer", "index_analyzer":
"my_analyzer", "properties": { "user": { "type": "string" },
"message": {"type": "string" } } } }'
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '
{"user": "bob", "message": "team meeting"}'

Queries:

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{"query" : {"text": {"_all": "meeting"} } }'
--> no results

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{"query" : {"text": {"_all": "meet"} } }'
--> returns the document


(erlo) #2

If you want to search on the special _all field it seems that you should
specify the analyzer in the query too:

Try this:

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{
"query":{
"text":{
"_all":{
"query":"meeting",
"analyzer":"my_analyzer"
}
}
}
}'

James Wilson wrote:

Short description: searching by the stem word works ("meet") but using
the word in the source document ("meeting") doesn't. I'm missing
something here that's probably fundamental, but it's not explained
anywhere.

I thought a query of "meeting" would be stemmed to "meet" and then it
would find a document that was analyzed the same way at index time.

elasticsearch.yml:

index:
analysis:
analyzer:
my_analyzer: # duplicate the snowball analyzer
type: custom
tokenizer: standard
filter: [standard, lowercase, stop, snowball]

Setup:

curl -XPUT 'http://localhost:9200/twitter'
curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{"tweet" : { "search_analyzer": "my_analyzer", "index_analyzer":
"my_analyzer", "properties": { "user": { "type": "string" },
"message": {"type": "string" } } } }'
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '
{"user": "bob", "message": "team meeting"}'

Queries:

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{"query" : {"text": {"_all": "meeting"} } }'
--> no results

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{"query" : {"text": {"_all": "meet"} } }'
--> returns the document


(Shay Banon) #3

Heya,

When searching "against a type", matching docs automatically gets
filtered only for that type, but, search_analyzer associated with that type
is not applied. It should be, btw, opened an issue:
https://github.com/elasticsearch/elasticsearch/issues/1391. For now, you can
explicitly specify the analyzer, or, specify the analyzer as the default
analyzer for that index (Across types), by simply renaming my_analyzer to
default.

-shay.banon

On Tue, Oct 11, 2011 at 12:40 AM, James Wilson jwilson556@gmail.com wrote:

Short description: searching by the stem word works ("meet") but using
the word in the source document ("meeting") doesn't. I'm missing
something here that's probably fundamental, but it's not explained
anywhere.

I thought a query of "meeting" would be stemmed to "meet" and then it
would find a document that was analyzed the same way at index time.

elasticsearch.yml:

index:
analysis:
analyzer:
my_analyzer: # duplicate the snowball analyzer
type: custom
tokenizer: standard
filter: [standard, lowercase, stop, snowball]

Setup:

curl -XPUT 'http://localhost:9200/twitter'
curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{"tweet" : { "search_analyzer": "my_analyzer", "index_analyzer":
"my_analyzer", "properties": { "user": { "type": "string" },
"message": {"type": "string" } } } }'
curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '
{"user": "bob", "message": "team meeting"}'

Queries:

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{"query" : {"text": {"_all": "meeting"} } }'
--> no results

curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '
{"query" : {"text": {"_all": "meet"} } }'
--> returns the document


(system) #4