Exact phrase match - city names example


(Greg Silin) #1

Hi,
One of our fields in the index stores city names, and we need to ensure
that the term is matched exactly.

So if we have "san francisco" indexed, we need to ensure that only the
term "san francisco" matches; "san" or "francisco" or "south san francisco"
should all be misses.

In particular, I don't have a solution on how to make sure "san francisco"
does not match against "south san francisco"

Thanks
-greg


(Colin Dellow) #2

Does "index": "not_analyzed" not work for you (
http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?

On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:

Hi,
One of our fields in the index stores city names, and we need to ensure
that the term is matched exactly.

So if we have "san francisco" indexed, we need to ensure that only the
term "san francisco" matches; "san" or "francisco" or "south san francisco"
should all be misses.

In particular, I don't have a solution on how to make sure "san francisco"
does not match against "south san francisco"

Thanks
-greg


(thalej) #3

Hi Greg - Did you ever find a solution for the question you posted? I am having the same issue and I am hoping you can post your solution. Thanks.


(thale jacobs) #4

I am having problem a similar problem too. Here is how I set it up the
test index:

Create the index:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed",
"index" : "not_analyzed"
}
}
}
}
}'

Inert some data:
curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main St"]}'
curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"] }'
curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main Rd"]
}'
curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main Rd"]
}'
curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"] }'

--Now attempt to search for "Main"... Not "Main St", Not "East Main Rd"...I
only want to return doc #9 - "Main"
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"main",
"type":"phrase",
"analyzer" : "keyword"
}
}
}
]
}
}
}';

The best document returned is "Main", but I don't know how to filter out
the others that are not exact matches (although they contain matching
terms).
...
Here the results from my example above:
"_score" : 0.2876821, "_source" : { "street": ["Main"] }
"_score" : 0.25316024, "_source" : { "street": ["East Main Rd"] }
"_score" : 0.25316024, "_source" : { "street": ["W Main St"] }
"_score" : 0.25316024, "_source" : { "street": ["E Main St"]}
"_score" : 0.1805489, "_source" : { "street": ["Main St"] }
"_score" : 0.14638957, "_source" : { "street": ["West Main Rd"] }

On Thursday, June 14, 2012 3:38:31 PM UTC-4, Colin Dellow wrote:

Does "index": "not_analyzed" not work for you (
http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?

On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:

Hi,
One of our fields in the index stores city names, and we need to ensure
that the term is matched exactly.

So if we have "san francisco" indexed, we need to ensure that only the
term "san francisco" matches; "san" or "francisco" or "south san francisco"
should all be misses.

In particular, I don't have a solution on how to make sure "san
francisco" does not match against "south san francisco"

Thanks
-greg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/42921778-0a92-4a57-ab6f-7f089ebe95ec%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly-2) #5

Greg, to add to Colin's reply, try not_analyzed, or if you want
case-insensitive searches, then you can do a custom analyzer consisting of
keyword tokenizer + lowercase filter. You might also be interested in the
multi fields feature if you want to search on the same field in many
different ways:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#_multi_fields_3

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cf786122-8d78-4fd4-8548-31a999e6bfbd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly-2) #6

Thale,

Can you double check the mapping. Something seems off to me. Should be
something like this:

{
"mappings": {
"name": {
"properties": {
"street": {
"type": "string",
"index" : "not_analyzed"
}
}
}
}
}

And don't forget, not_analyzed means case-sensitive matches, fyi. :slight_smile:

On Wednesday, February 26, 2014 4:51:40 PM UTC-5, thale jacobs wrote:

I am having problem a similar problem too. Here is how I set it up the
test index:

Create the index:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed",
"index" : "not_analyzed"
}
}
}
}
}'

Inert some data:
curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main St"]}'
curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"]
}'
curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main
Rd"] }'
curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main
Rd"] }'
curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"] }'

--Now attempt to search for "Main"... Not "Main St", Not "East Main
Rd"...I only want to return doc #9 - "Main"
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"main",
"type":"phrase",
"analyzer" : "keyword"
}
}
}
]
}
}
}';

The best document returned is "Main", but I don't know how to filter out
the others that are not exact matches (although they contain matching
terms).
...
Here the results from my example above:
"_score" : 0.2876821, "_source" : { "street": ["Main"] }
"_score" : 0.25316024, "_source" : { "street": ["East Main Rd"] }
"_score" : 0.25316024, "_source" : { "street": ["W Main St"] }
"_score" : 0.25316024, "_source" : { "street": ["E Main St"]}
"_score" : 0.1805489, "_source" : { "street": ["Main St"] }
"_score" : 0.14638957, "_source" : { "street": ["West Main Rd"] }

On Thursday, June 14, 2012 3:38:31 PM UTC-4, Colin Dellow wrote:

Does "index": "not_analyzed" not work for you (
http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?

On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:

Hi,
One of our fields in the index stores city names, and we need to ensure
that the term is matched exactly.

So if we have "san francisco" indexed, we need to ensure that only the
term "san francisco" matches; "san" or "francisco" or "south san francisco"
should all be misses.

In particular, I don't have a solution on how to make sure "san
francisco" does not match against "south san francisco"

Thanks
-greg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be851b91-7355-4d11-bac6-20ff321611d3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #7

Thanks for the reply Binh Ly - I think the mapping in your example are
almost like the example I posted and I believe are functionality the
equivalent. But my query against the not_analyzed fields return all the
docs with the word "Main" in them....From the query side I also thought I
could specify "analyzer" : "keyword"...but also get the same results...but
yes, your are correct in something seems off as I can query and the case of
the search term does not seem to impact the results so that is telling me a
search analyzer is being used???

On Wednesday, February 26, 2014 5:12:02 PM UTC-5, Binh Ly wrote:

Thale,

Can you double check the mapping. Something seems off to me. Should be
something like this:

{
"mappings": {
"name": {
"properties": {
"street": {
"type": "string",
"index" : "not_analyzed"
}
}
}
}
}

And don't forget, not_analyzed means case-sensitive matches, fyi. :slight_smile:

On Wednesday, February 26, 2014 4:51:40 PM UTC-5, thale jacobs wrote:

I am having problem a similar problem too. Here is how I set it up the
test index:

Create the index:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed",
"index" : "not_analyzed"
}
}
}
}
}'

Inert some data:
curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main
St"]}'
curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"]
}'
curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main
Rd"] }'
curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main
Rd"] }'
curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"]
}'

--Now attempt to search for "Main"... Not "Main St", Not "East Main
Rd"...I only want to return doc #9 - "Main"
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"main",
"type":"phrase",
"analyzer" : "keyword"
}
}
}
]
}
}
}';

The best document returned is "Main", but I don't know how to filter out
the others that are not exact matches (although they contain matching
terms).
...
Here the results from my example above:
"_score" : 0.2876821, "_source" : { "street": ["Main"] }
"_score" : 0.25316024, "_source" : { "street": ["East Main Rd"] }
"_score" : 0.25316024, "_source" : { "street": ["W Main St"] }
"_score" : 0.25316024, "_source" : { "street": ["E Main St"]}
"_score" : 0.1805489, "_source" : { "street": ["Main St"] }
"_score" : 0.14638957, "_source" : { "street": ["West Main Rd"] }

On Thursday, June 14, 2012 3:38:31 PM UTC-4, Colin Dellow wrote:

Does "index": "not_analyzed" not work for you (
http://www.elasticsearch.org/guide/reference/mapping/core-types.html) ?

On Thursday, 14 June 2012 14:02:28 UTC-4, Greg Silin wrote:

Hi,
One of our fields in the index stores city names, and we need to ensure
that the term is matched exactly.

So if we have "san francisco" indexed, we need to ensure that only
the term "san francisco" matches; "san" or "francisco" or "south san
francisco" should all be misses.

In particular, I don't have a solution on how to make sure "san
francisco" does not match against "south san francisco"

Thanks
-greg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5f94aca2-1754-4358-9be7-f763b671fc48%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #8

Thanks for the reply Prashy - I tried performing a term query like you
suggested; I get the same results (all documents containing main are
returned...E Main St, W Main St...) Do you only get one document returned
using the example I provided above (doc id 9/"Main")??

On Thursday, February 27, 2014 2:25:09 AM UTC-5, Prashy wrote:

Try using the term query as term query is not analyzed so it might search
the
exact term only.

{
"query" : {
"term" : { "street" : "xxx" }
}
}

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Exact-phrase-match-city-names-example-tp4019310p4050604.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c5de1e2-b65d-4824-81d5-2e0e9636094d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly-2) #9

So if I do this:

curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"name": {
"properties": {
"street": {
"type": "string",
"index" : "not_analyzed"
}
}
}
}
}

Then I do this:

curl -s -XPUT 'localhost:9200/test/name/5' -d '{ "street": ["E Main St"]}'
curl -s -XPUT 'localhost:9200/test/name/6' -d '{ "street": ["W Main St"] }'
curl -s -XPUT 'localhost:9200/test/name/7' -d '{ "street": ["East Main Rd"]
}'
curl -s -XPUT 'localhost:9200/test/name/8' -d '{ "street": ["West Main Rd"]
}'
curl -s -XPUT 'localhost:9200/test/name/9' -d '{ "street": ["Main"] }'
curl -s -XPUT 'localhost:9200/test/name/10' -d '{ "street": ["Main St"] }'

Then I do this:

curl -XGET "localhost:9200/test/_search?pretty" -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"Main"
}
}
}
]
}
}
}

I get back this:

{
"hits" : {
"hits" : [ {
"_index" : "test",
"_type" : "name",
"_id" : "9",
"_score" : 0.30685282, "_source" : { "street": ["Main"] }
} ]
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/13b6abb4-c58e-44f3-bfad-303b677b5284%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(thale jacobs) #10

I get the same results as you using your example.... Thanks for posting
it. I am not sure why my original example does not work, but that is for
me to figure out! Thanks again.

On Thursday, June 14, 2012 2:02:28 PM UTC-4, Greg Silin wrote:

Hi,
One of our fields in the index stores city names, and we need to ensure
that the term is matched exactly.

So if we have "san francisco" indexed, we need to ensure that only the
term "san francisco" matches; "san" or "francisco" or "south san francisco"
should all be misses.

In particular, I don't have a solution on how to make sure "san francisco"
does not match against "south san francisco"

Thanks
-greg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5ef2e54a-b7b3-4dc6-a4d9-32d2eabb2010%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #11