edgeNGram weirdness

Hi,

I'm having trouble getting a edgengram query to behave properly. I have one record "blue grass" with an edgengram minimum of 2. A query string of "blv" however returns "blue grass" although it shouldn't.

curl -X POST http://localhost:9200/test -d '{
"mappings": {
"product/fragrance": {
"properties": {
"name_query": {
"index_analyzer": "query_index_analyzer",
"search_anaylzer": "query_search_analyzer",
"as": {},
"type": "string"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"query_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20,
"side": "front"
}
},
"analyzer": {
"query_index_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding", "query_edgengram"]
},
"query_search_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding"]
}
}
}
}
}'

curl -X POST "http://localhost:9200/test/product%2Ffragrance/1" -d '{
"name_query": "blue grass"
}'

curl -X GET "http://localhost:9200/test/product%2Ffragrance/_search?load=true&pretty=true" -d '{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "blv",
"fields": ["name_query"],
"default_operator": "OR"
}
}]
}
}
}'

For some reason, I get a result from that. Can anyone explain why? Thanks. What I want to happen is "blv" shouldn't be returning "blue grass" although "bl" should. I've used the analyze API and see "blue grass" being broken down to "bl", "blu", "blue", "gr", "gra", "gras", "grass" but "blv" doesn't match any of those.

I answered on stackoverflow.

elasticsearch - Query string returning results not found in edgeNGram - Stack Overflow

David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 16 oct. 2012 à 10:39, Axsuul axsuul@gmail.com a écrit :

Hi,

I'm having trouble getting a edgengram query to behave properly. I have one
record "blue grass" with an edgengram minimum of 2. A query string of "blv"
however returns "blue grass" although it shouldn't.

curl -X POST http://localhost:9200/test -d '{
"mappings": {
"product/fragrance": {
"properties": {
"name_query": {
"index_analyzer": "query_index_analyzer",
"search_anaylzer": "query_search_analyzer",
"as": {},
"type": "string"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"query_edgengram": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20,
"side": "front"
}
},
"analyzer": {
"query_index_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding", "query_edgengram"]
},
"query_search_analyzer": {
"tokenizer": "lowercase",
"filter": ["asciifolding"]
}
}
}
}
}'

curl -X POST "http://localhost:9200/test/product%2Ffragrance/1" -d '{
"name_query": "blue grass"
}'

curl -X GET
"http://localhost:9200/test/product%2Ffragrance/_search?load=true&pretty=true"
-d '{
"query": {
"bool": {
"must": [{
"query_string": {
"query": "blv",
"fields": ["name_query"],
"default_operator": "OR"
}
}]
}
}
}'

For some reason, I get a result from that. Can anyone explain why? Thanks.
What I want to happen is "blv" shouldn't be returning "blue grass" although
"bl" should. I've used the analyze API and see "blue grass" being broken
down to "bl", "blu", "blue", "gr", "gra", "gras", "grass" but "blv" doesn't
match any of those.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/edgeNGram-weirdness-tp4024036.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--

--