Elasticsearch not matching a 1-token analysed string value


(Emanuil Tolev) #1

Hello everybody,

I'm using a dynamic template in order to peruse 2 versions of each field -
an analysed one, and a non-analysed one. Elasticsearch 0.90.7 with no
plugins, HTTP transport.

This is the mapping for the type in question:

{
"journal": {
"dynamic_templates": [
{
"default": {
"mapping": {
"fields": {
"{name}": {
"index": "analyzed",
"type": "{dynamic_type}",
"store": "no"
},
"exact": {
"index": "not_analyzed",
"type": "{dynamic_type}",
"store": "yes"
}
},
"type": "multi_field"
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"properties": {
"admin": {
"properties": {
"owner": {
"type": "multi_field",
"fields": {
"owner": {
"type": "string"
},
"exact": {
"type": "string",
"index": "not_analyzed",
"store": true,
"omit_norms": true,
"index_options": "docs",
"include_in_all": false
}
}
}
}
}
}
}
}

So each journal has an admin object, which has an owner key, which is a
string. So far, so good.

The problem is that this query works:

POST /index_name/journal/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"admin.owner.exact": "1892042X"
}
}
]
}
}
}

returns 5 journals which have admin.owner set to that exact value.

But this one does not!

POST /index_name/journal/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"admin.owner": "1892042X"
}
}
]
}
}
}

Note that the .exact suffix is missing. The thing is, why doesn't ES match
this? There's no whitespace or anything like that, it's just 8 characters
next to each other. (Not to mention I'd have expected it to match if the
stored value in the 5 records was "1892042X more tokens".)

I can obviously use .exact to get what I want, but I want to understand how
ES works in this regard, at least to some basic degree...

Thanks,
Emanuil

P.S. I also ran some _explain-s against one of the records which should
match. I ran the exact same queries, but against
/index_name/journal/a_journal_id/_explain . Here are the results:

(using .exact):
{
"ok": true,
"_index": "index_name",
"_type": "journal",
"_id": "a_journal_id",
"matched": true,
"explanation": {
"value": 13.024077,
"description": "sum of:",
"details": [
{
"value": 12.947296,
"description": "weight(admin.owner.exact:1892042X in 37942)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 12.947296,
"description": "score(doc=37942,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.99704796,
"description": "queryWeight, product of:",
"details": [
{
"value": 12.98563,
"description": "idf(docFreq=2,
maxDocs=481298)"
},
{
"value": 0.07678087,
"description": "queryNorm"
}
]
},
{
"value": 12.98563,
"description": "fieldWeight in 37942, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 12.98563,
"description": "idf(docFreq=2,
maxDocs=481298)"
},
{
"value": 1,
"description": "fieldNorm(doc=37942)"
}
]
}
]
}
]
},
{
"value": 0.07678087,
"description": "ConstantScore(:), product of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 0.07678087,
"description": "queryNorm"
}
]
}
]
}
}

(without .exact, the analysed version of the field):
{
"ok": true,
"_index": "index_name",
"_type": "journal",
"_id": "a_journal_id",
"matched": false,
"explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited
clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause
(admin.owner:1892042X)"
},
{
"value": 0.07082304,
"description": "ConstantScore(:), product of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 0.07082304,
"description": "queryNorm"
}
]
}
]
}
}

I already know one is matching and the other isn't matching the doc, but I
can't divine the reason for it from those explain results.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/070b4634-343f-43e2-bfec-1216ee4d80a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #2

Because it has been analyzed and converted to lowercase.

Try with MatchQuery.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 mars 2014 à 15:08, Emanuil Tolev emanuil@cottagelabs.com a écrit :

Hello everybody,

I'm using a dynamic template in order to peruse 2 versions of each field - an analysed one, and a non-analysed one. Elasticsearch 0.90.7 with no plugins, HTTP transport.

This is the mapping for the type in question:

{
"journal": {
"dynamic_templates": [
{
"default": {
"mapping": {
"fields": {
"{name}": {
"index": "analyzed",
"type": "{dynamic_type}",
"store": "no"
},
"exact": {
"index": "not_analyzed",
"type": "{dynamic_type}",
"store": "yes"
}
},
"type": "multi_field"
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"properties": {
"admin": {
"properties": {
"owner": {
"type": "multi_field",
"fields": {
"owner": {
"type": "string"
},
"exact": {
"type": "string",
"index": "not_analyzed",
"store": true,
"omit_norms": true,
"index_options": "docs",
"include_in_all": false
}
}
}
}
}
}
}
}

So each journal has an admin object, which has an owner key, which is a string. So far, so good.

The problem is that this query works:

POST /index_name/journal/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"admin.owner.exact": "1892042X"
}
}
]
}
}
}

returns 5 journals which have admin.owner set to that exact value.

But this one does not!

POST /index_name/journal/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"admin.owner": "1892042X"
}
}
]
}
}
}

Note that the .exact suffix is missing. The thing is, why doesn't ES match this? There's no whitespace or anything like that, it's just 8 characters next to each other. (Not to mention I'd have expected it to match if the stored value in the 5 records was "1892042X more tokens".)

I can obviously use .exact to get what I want, but I want to understand how ES works in this regard, at least to some basic degree...

Thanks,
Emanuil

P.S. I also ran some _explain-s against one of the records which should match. I ran the exact same queries, but against /index_name/journal/a_journal_id/_explain . Here are the results:

(using .exact):
{
"ok": true,
"_index": "index_name",
"_type": "journal",
"_id": "a_journal_id",
"matched": true,
"explanation": {
"value": 13.024077,
"description": "sum of:",
"details": [
{
"value": 12.947296,
"description": "weight(admin.owner.exact:1892042X in 37942) [PerFieldSimilarity], result of:",
"details": [
{
"value": 12.947296,
"description": "score(doc=37942,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 0.99704796,
"description": "queryWeight, product of:",
"details": [
{
"value": 12.98563,
"description": "idf(docFreq=2, maxDocs=481298)"
},
{
"value": 0.07678087,
"description": "queryNorm"
}
]
},
{
"value": 12.98563,
"description": "fieldWeight in 37942, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 12.98563,
"description": "idf(docFreq=2, maxDocs=481298)"
},
{
"value": 1,
"description": "fieldNorm(doc=37942)"
}
]
}
]
}
]
},
{
"value": 0.07678087,
"description": "ConstantScore(:), product of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 0.07678087,
"description": "queryNorm"
}
]
}
]
}
}

(without .exact, the analysed version of the field):
{
"ok": true,
"_index": "index_name",
"_type": "journal",
"_id": "a_journal_id",
"matched": false,
"explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause (admin.owner:1892042X)"
},
{
"value": 0.07082304,
"description": "ConstantScore(:), product of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 0.07082304,
"description": "queryNorm"
}
]
}
]
}
}

I already know one is matching and the other isn't matching the doc, but I can't divine the reason for it from those explain results.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/070b4634-343f-43e2-bfec-1216ee4d80a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F9713A61-D242-4EAC-B822-A354E0E12921%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Emanuil Tolev) #3

Thanks David, that worked beautifully. TermQuery does say your term won't
be analysed!

Thanks,
Emanuil

On Tuesday, March 25, 2014 2:17:18 PM UTC, David Pilato wrote:

Because it has been analyzed and converted to lowercase.

Try with MatchQuery.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 mars 2014 à 15:08, Emanuil Tolev <ema...@cottagelabs.com<javascript:>>
a écrit :

Hello everybody,

I'm using a dynamic template in order to peruse 2 versions of each field -
an analysed one, and a non-analysed one. Elasticsearch 0.90.7 with no
plugins, HTTP transport.

This is the mapping for the type in question:

{
"journal": {
"dynamic_templates": [
{
"default": {
"mapping": {
"fields": {
"{name}": {
"index": "analyzed",
"type": "{dynamic_type}",
"store": "no"
},
"exact": {
"index": "not_analyzed",
"type": "{dynamic_type}",
"store": "yes"
}
},
"type": "multi_field"
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"properties": {
"admin": {
"properties": {
"owner": {
"type": "multi_field",
"fields": {
"owner": {
"type": "string"
},
"exact": {
"type": "string",
"index": "not_analyzed",
"store": true,
"omit_norms": true,
"index_options": "docs",
"include_in_all": false
}
}
}
}
}
}
}
}

So each journal has an admin object, which has an owner key, which is a
string. So far, so good.

The problem is that this query works:

POST /index_name/journal/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"admin.owner.exact": "1892042X"
}
}
]
}
}
}

returns 5 journals which have admin.owner set to that exact value.

But this one does not!

POST /index_name/journal/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"admin.owner": "1892042X"
}
}
]
}
}
}

Note that the .exact suffix is missing. The thing is, why doesn't ES match
this? There's no whitespace or anything like that, it's just 8 characters
next to each other. (Not to mention I'd have expected it to match if the
stored value in the 5 records was "1892042X more tokens".)

I can obviously use .exact to get what I want, but I want to understand
how ES works in this regard, at least to some basic degree...

Thanks,
Emanuil

P.S. I also ran some _explain-s against one of the records which should
match. I ran the exact same queries, but against
/index_name/journal/a_journal_id/_explain . Here are the results:

(using .exact):
{
"ok": true,
"_index": "index_name",
"_type": "journal",
"_id": "a_journal_id",
"matched": true,
"explanation": {
"value": 13.024077,
"description": "sum of:",
"details": [
{
"value": 12.947296,
"description": "weight(admin.owner.exact:1892042X in 37942)
[PerFieldSimilarity], result of:",
"details": [
{
"value": 12.947296,
"description": "score(doc=37942,freq=1.0 =
termFreq=1.0\n), product of:",
"details": [
{
"value": 0.99704796,
"description": "queryWeight, product of:",
"details": [
{
"value": 12.98563,
"description": "idf(docFreq=2,
maxDocs=481298)"
},
{
"value": 0.07678087,
"description": "queryNorm"
}
]
},
{
"value": 12.98563,
"description": "fieldWeight in 37942, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 12.98563,
"description": "idf(docFreq=2,
maxDocs=481298)"
},
{
"value": 1,
"description": "fieldNorm(doc=37942)"
}
]
}
]
}
]
},
{
"value": 0.07678087,
"description": "ConstantScore(:), product of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 0.07678087,
"description": "queryNorm"
}
]
}
]
}
}

(without .exact, the analysed version of the field):
{
"ok": true,
"_index": "index_name",
"_type": "journal",
"_id": "a_journal_id",
"matched": false,
"explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited
clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause
(admin.owner:1892042X)"
},
{
"value": 0.07082304,
"description": "ConstantScore(:), product of:",
"details": [
{
"value": 1,
"description": "boost"
},
{
"value": 0.07082304,
"description": "queryNorm"
}
]
}
]
}
}

I already know one is matching and the other isn't matching the doc, but I
can't divine the reason for it from those explain results.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/070b4634-343f-43e2-bfec-1216ee4d80a1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/070b4634-343f-43e2-bfec-1216ee4d80a1%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0f6f3187-5ae5-40b2-8d3e-001df810c281%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4