Problems with filters/facets on not_analyzed fields

Hello,

I am running into an issue with filters and facets on not_analyzed (also
tried keyword analyzer, same results) fields. I have tested this with ES
0.19.3, 0.20.4, and 0.90rc1. Basically what happens is that the fields
appear to be analyzed as the facets are the tokenized + downcased values
and filtering on the exact term yields no results. Here is a simple test:

curl -X POST "http://localhost:9200/testing"

curl -X PUT "http://localhost:9200/testing/foo/_mapping" -d
'{"foo":{"properties":{"title":{"type":"string","analyzer":"standard"}},"tags":{"type":"string","index":"not_analyzed"}}}}'
curl -X POST "http://localhost:9200/testing/foo" -d '{"title" : "Foo",
"tags" : ["One"]}'
curl -X POST "http://localhost:9200/testing/foo" -d '{"title" : "Foo Bar",
"tags" : "Two Three"}'
curl -X POST "http://localhost:9200/testing/foo" -d '{"title" : "Foo Bar
Baz", "tags" : ["One", "Two Three", "Four::Five"]}'
curl -X POST "http://localhost:9200/testing/_refresh"
curl -X GET "http://localhost:9200/testing/foo/_mapping"

Now if I check the facets for a match all query, I get the terms that
appear to be analyzed ("Four::Five" becomes two terms, "four" and "five").

curl -X GET "http://localhost:9200/testing/foo/_search?pretty" -d

'{"facets":{"tagsFacet":{"terms":{"field":"tags","size":10,"all_terms":false}}}}'
"facets": {
"tagsFacet": {
"_type": "terms",
"missing": 0,
"other": 0,
"terms": [{
"count": 2,
"term": "two"
}, {
"count": 2,
"term": "three"
}, {
"count": 2,
"term": "one"
}, {
"count": 1,
"term": "four"
}, {
"count": 1,
"term": "five"
}
],
"total": 8
}
}

This is what I would expect:

"facets": {

"tagsFacet": {
"_type": "terms",
"missing": 0,
"other": 0,
"terms": [{
"count": 2,
"term": "Two Three"
}, {
"count": 2,
"term": "One"
}, {
"count": 1,
"term": "Four::Five"
}
],
"total": 5
}
}

Similarly, filtering on an exact term yields no results.

curl -X GET "http://localhost:9200/testing/foo/_search?pretty" -d

'{"filter":{"term":{"tags":"Two Three"}}}'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

I would expect this to return the two documents with exactly that tag. Any
help is greatly appreciated.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also, it may be worth noting that this or something similar has been
brought up a handful of times before:


https://groups.google.com/forum/?fromgroups=#!searchin/elasticsearch/not_analyzed$20filter/elasticsearch/aqsppkqbZro/Vi_Sim9Cl_wJ

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hiya

    curl -X POST "http://localhost:9200/testing"
    curl -X PUT "http://localhost:9200/testing/foo/_mapping" -d
    '{"foo":{"properties":{"title":{"type":"string","analyzer":"standard"}},"tags":{"type":"string","index":"not_analyzed"}}}}'
    curl -X GET "http://localhost:9200/testing/foo/_mapping"

Pretty printing your JSON really helps for debugging such problems.
Also, you check the mapping in the last request above, but you don't
mention the fact that the mapping is incorrect.

Your PUT mapping is putting this JSON: (note the bad nesting, and the
extra } at the end):

{
"foo" : {
"tags" : {
"index" : "not_analyzed",
"type" : "string"
},
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "standard"
}
}
}
}
}

It should be putting this JSON:
{
"foo" : {
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "standard"
},
"tags" : {
"index" : "not_analyzed",
"type" : "string"
}
}
}
}

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Clint, I'll pretty print that mapping next time!

On Friday, March 29, 2013 3:37:20 PM UTC-7, qjh wrote:

Hello,

I am running into an issue with filters and facets on not_analyzed (also
tried keyword analyzer, same results) fields. I have tested this with ES
0.19.3, 0.20.4, and 0.90rc1. Basically what happens is that the fields
appear to be analyzed as the facets are the tokenized + downcased values
and filtering on the exact term yields no results. Here is a simple test:

curl -X POST "http://localhost:9200/testing"

curl -X PUT "http://localhost:9200/testing/foo/_mapping" -d
'{"foo":{"properties":{"title":{"type":"string","analyzer":"standard"}},"tags":{"type":"string","index":"not_analyzed"}}}}'
curl -X POST "http://localhost:9200/testing/foo" -d '{"title" : "Foo",
"tags" : ["One"]}'
curl -X POST "http://localhost:9200/testing/foo" -d '{"title" : "Foo
Bar", "tags" : "Two Three"}'
curl -X POST "http://localhost:9200/testing/foo" -d '{"title" : "Foo Bar
Baz", "tags" : ["One", "Two Three", "Four::Five"]}'
curl -X POST "http://localhost:9200/testing/_refresh"
curl -X GET "http://localhost:9200/testing/foo/_mapping"

Now if I check the facets for a match all query, I get the terms that
appear to be analyzed ("Four::Five" becomes two terms, "four" and "five").

curl -X GET "http://localhost:9200/testing/foo/_search?pretty" -d

'{"facets":{"tagsFacet":{"terms":{"field":"tags","size":10,"all_terms":false}}}}'
"facets": {
"tagsFacet": {
"_type": "terms",
"missing": 0,
"other": 0,
"terms": [{
"count": 2,
"term": "two"
}, {
"count": 2,
"term": "three"
}, {
"count": 2,
"term": "one"
}, {
"count": 1,
"term": "four"
}, {
"count": 1,
"term": "five"
}
],
"total": 8
}
}

This is what I would expect:

"facets": {

"tagsFacet": {
"_type": "terms",
"missing": 0,
"other": 0,
"terms": [{
"count": 2,
"term": "Two Three"
}, {
"count": 2,
"term": "One"
}, {
"count": 1,
"term": "Four::Five"
}
],
"total": 5
}
}

Similarly, filtering on an exact term yields no results.

curl -X GET "http://localhost:9200/testing/foo/_search?pretty" -d

'{"filter":{"term":{"tags":"Two Three"}}}'
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}

I would expect this to return the two documents with exactly that tag.
Any help is greatly appreciated.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.