Hi,
I've been trying to use some new analyzers in my ES instance (version
0.20.4) and I've been noticing some problems on search. I've been trying to
follow this simple
example: http://mnylen.tumblr.com/post/22963609412/elasticsearch-and-a-simple-contains-search
Here's how I set up my data:
curl -XPUT "http://localhost:9200/catalog/?pretty" -d '{
"mappings" : {
"product" : {
"properties" : {
"title" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
}
}
}
},
"settings" : {
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
*"tokenizer" : "whitespace",*
"filter" : ["lowercase"]
},
"str_index_analyzer" : {
*"tokenizer" : "whitespace",*
"filter" : ["lowercase", "substring"]
}
},
"filter" : {
"substring" : {
"type" : "nGram",
*"min_gram"** : 6*,
"max_gram" : 20
}
}
}
}
}';
curl -XPOST "http://localhost:9200/catalog/product?pretty" -d '{
"title" : "Logitech Wireless Keyboard K350"
}';
curl -XPOST "http://localhost:9200/catalog/product?pretty" -d '{
"title" : "Das Keyboard"
}'
Some things I changed from the example where "tokenizer" : "keyword" is
now "tokenizer" : "whitespace" and "min_gram" is now 6 instead of 1.
The first problem is that unless I specify a field, searching with tokens
don't work.
The tokens for the word "Logitech" are "gitech", "logite", "logitec",
"logitech", "ogitec" and "ogitech". So I would expect that searching any of
those substrings would return me the Logitech result, however this returns
no results:
http://localhost:9200/catalog/_search?q=ogitec
{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed
": 0},"hits": {"total": 0,"max_score": null,"hits": []}}
If I do the same search and specify title, only then do I get the expected
result:
http://localhost:9200/catalog/_search?q=title:ogitec
{"took": 4,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed
": 0},"hits": {"total": 1,"max_score": 0.067124054,"hits": [{"_index": "
catalog","_type": "product","_id": "J4trtjx9Rwm5jdI0kqmygA","_score":
0.067124054,"_source": {"title": "Logitech Wireless Keyboard K350"}}]}}
Does anyone know why this is happening?
The second thing I noticed (using the same data setup) was that when I
analyze the strings I'm searching I get two different types. Using the
default analyzer on the index catalog shows the type as an :
http://localhost:9200/catalog/_analyze?pretty&text=ogitec
{"tokens": [{"token": "ogitec","start_offset": 0,"end_offset": 6,"type": "
","position": 1}]}
But using my custom analyzer shows the type as a "word":
http://localhost:9200/catalog/_analyze?pretty&text=ogitec&analyzer=str_search_analyzer
{"tokens": [{"token": "ogitec","start_offset": 0,"end_offset": 6,"type": "
word","position": 1}]}
I'm wondering if this has something to do with the problem above
(searching over all fields in an index as opposed to a specified one), and
why it is happening.
Any help is appreciated.
(Also, sorry for the length of the post. I didn't want to leave out any
information)
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.