Hello - I am having a problem indexing and searching for words that may or
may not contain whitespace...Below is an example
Here is how the index is created:
curl -s -XPUT 'localhost:9200/test/name/1' -d '{ "street": "Lakeshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/2' -d '{ "street": "Sunnyshore Dr"
}'
curl -s -XPUT 'localhost:9200/test/name/3' -d '{ "street": "Lake View Dr" }'
curl -s -XPUT 'localhost:9200/test/name/4' -d '{ "street": "Shore Dr" }'
If I want to query for record 1/"Lakeshore Dr", I can using the following
query:
curl -s -XGET 'localhost:9200/test/name/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"lakeshore dr",
"type":"phrase"
}
}
}
]
}
}
}';
This returns the desired result of document id 1. But if a user searches
for "Lake Shore Dr" (a space between Lake and Shore), it is still desired
to return document id 1.
And the inverse of this problem is if a user searches for "Lakeview Dr"
(but indexed as "Lake View Dr"):
curl -s -XGET 'localhost:9200/test/name/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"lakeview dr",
"type":"phrase"
}
}
}
]
}
}
}';
The search matches to no documents. If the search is changed to a booleansearch instead of a phrase
,
many docs will match on "dr", but doc #3, "Lake Shore" is not necessarily
returned as the top match.
NGrams at index time?? Ngrams at search time?? Remove whitespace at index
time/search time??
Any suggestions would be appreciated. Thanks.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/06538a83-17d1-446c-9b27-cebf12c6fc47%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.