edgeNGram weirdness


(Axsuul) #1

Hi,

I'm having trouble getting a edgengram query to behave properly. I have one record "blue grass" with an edgengram minimum of 2. A query string of "blv" however returns "blue grass" although it shouldn't.

curl -X POST http://localhost:9200/test -d '{
"mappings": {
"product/fragrance": {
"properties": {
"name_query": {
"index_analyzer": "query_index_analyzer",
"search_anaylzer": "query_search_analyzer",
"as": {},
"type": "string"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"query_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20,
"side": "front"
}
},
"analyzer": {
"query_index_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding", "query_edgengram"]
},
"query_search_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding"]
}
}
}
}
}'

curl -X POST "http://localhost:9200/test/product%2Ffragrance/1" -d '{
"name_query": "blue grass"
}'

curl -X GET "http://localhost:9200/test/product%2Ffragrance/_search?load=true&pretty=true" -d '{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "blv",
"fields": ["name_query"],
"default_operator": "OR"
}
}]
}
}
}'

For some reason, I get a result from that. Can anyone explain why? Thanks. What I want to happen is "blv" shouldn't be returning "blue grass" although "bl" should. I've used the analyze API and see "blue grass" being broken down to "bl", "blu", "blue", "gr", "gra", "gras", "grass" but "blv" doesn't match any of those.


(David Pilato) #2

I answered on stackoverflow.

http://stackoverflow.com/questions/12909844/query-string-returning-results-not-found-in-edgengram/12911582#12911582

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 oct. 2012 à 10:39, Axsuul axsuul@gmail.com a écrit :

Hi,

I'm having trouble getting a edgengram query to behave properly. I have one
record "blue grass" with an edgengram minimum of 2. A query string of "blv"
however returns "blue grass" although it shouldn't.

curl -X POST http://localhost:9200/test -d '{
"mappings": {
"product/fragrance": {
"properties": {
"name_query": {
"index_analyzer": "query_index_analyzer",
"search_anaylzer": "query_search_analyzer",
"as": {},
"type": "string"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"query_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20,
"side": "front"
}
},
"analyzer": {
"query_index_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding", "query_edgengram"]
},
"query_search_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding"]
}
}
}
}
}'

curl -X POST "http://localhost:9200/test/product%2Ffragrance/1" -d '{
"name_query": "blue grass"
}'

curl -X GET
"http://localhost:9200/test/product%2Ffragrance/_search?load=true&pretty=true"
-d '{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "blv",
"fields": ["name_query"],
"default_operator": "OR"
}
}]
}
}
}'

For some reason, I get a result from that. Can anyone explain why? Thanks.
What I want to happen is "blv" shouldn't be returning "blue grass" although
"bl" should. I've used the analyze API and see "blue grass" being broken
down to "bl", "blu", "blue", "gr", "gra", "gras", "grass" but "blv" doesn't
match any of those.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/edgeNGram-weirdness-tp4024036.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--

--


(system) #3