Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}
Suppose I have value of "createdBy" attribute equals to "John Smith", and I
want to query ALL documents that have ANY field equals to "John Smith". As
I have field "createdBy" as not_analyzed, I should have one term "John
Smith" in _all field.
But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}
don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.
Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}
Suppose I have value of "createdBy" attribute equals to "John Smith", and
I want to query ALL documents that have ANY field equals to "John Smith".
As I have field "createdBy" as not_analyzed, I should have one term "John
Smith" in _all field.
But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}
don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.
If I actually read the documentation I provided, I would have seen that my
assumption is wrong. copy-to fields also define their own analyzer and
do not use post-analysis tokens.
Not sure if there is a clean way to achieve this goal in Elasticsearch. I
handle the use case by custom logic on the client side. My application
knows which fields are analyzed and which are not and creates queries
accordingly.
--
Ivan
On Tue, Jul 29, 2014 at 10:52 AM, Ivan Brusic ivan@brusic.com wrote:
The _all field has its own analyzer, so the analyzer that is defined on
the createdBy field is not applied.
I have never tried, but I believe the best solution is to use "copy-to" to
a custom field:
Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}
Suppose I have value of "createdBy" attribute equals to "John Smith", and
I want to query ALL documents that have ANY field equals to "John Smith".
As I have field "createdBy" as not_analyzed, I should have one term "John
Smith" in _all field.
But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}
don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.
Really I want to search by all document fields and enumerating all possible
attribute values in bool query is not good idea for me as my documents have
large number of different attribute names.
Idea we used before, we didn't used not_analyzed, but used our own encoder
to encode field values to some string without token delimiters. But we
think that it is not good way to do this, we want to try to find out
Elasticsearch solution as really ES holds not_analyzed values as terms, and
it should be the fastest query I can imagine for ES - "find all documents
having term like 'John Smith'"
Surely we can create our own _all filed instead of out of the box, like
"allnottokenized" that would contain all non_analyzed values that document
have in all not_analyzed attributes. But it is huge work - every time when
indexing document, we need to check mapping for every attribute to
understand whether it tokenzied or not and then fill our own all value.
Strange if ES doesn't have such simple functionality out of the box
On Tuesday, July 29, 2014 9:55:24 PM UTC+4, Ivan Brusic wrote:
If I actually read the documentation I provided, I would have seen that my
assumption is wrong. copy-to fields also define their own analyzer and
do not use post-analysis tokens.
Not sure if there is a clean way to achieve this goal in Elasticsearch. I
handle the use case by custom logic on the client side. My application
knows which fields are analyzed and which are not and creates queries
accordingly.
--
Ivan
On Tue, Jul 29, 2014 at 10:52 AM, Ivan Brusic <iv...@brusic.com
<javascript:>> wrote:
The _all field has its own analyzer, so the analyzer that is defined on
the createdBy field is not applied.
I have never tried, but I believe the best solution is to use "copy-to"
to a custom field:
On Tue, Jul 29, 2014 at 10:28 AM, Alexey Sidelnikov < alexey.s...@reltio.com <javascript:>> wrote:
Hi all!
Tired to find out the decision. I have some fields in my index that are
not_analized. Suppose something like that in mappings:{
"test-index-entities": {
"mappings": {
"testobject": {
"properties": {
"__location": {
"type": "geo_point"
},
"activity_end": {
"type": "long"
},
"activity_start": {
"type": "long"
},
"attributes": {},
"createdBy": {
"type": "string",
"index": "not_analyzed",
"include_in_all": true
}
}
}
}
}
}
Suppose I have value of "createdBy" attribute equals to "John Smith",
and I want to query ALL documents that have ANY field equals to "John
Smith". As I have field "createdBy" as not_analyzed, I should have one term
"John Smith" in _all field.
But executing of match query:
{
"match" : {
"_all" : "John Smith"
}
}
don't give results. I think that is because match query tokenizes query
itself and search for "john" and "smith" terms instead of searching by
not_tokenized "John Smith". In case of
{
"match" : {
"createdBy" : "John Smith"
}
}
all works fine, as ES finds in mappings that field is not tokenized.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.